Closed zestyping closed 3 years ago
@ChristopheDuong to add any necessary descriptions/acceptance criteria then triage
the correctness criterion as described by @zestyping:
destination-bigquery
that writes to table FTdestination-bigquery-denormalized
that writes to table STFrom @zestyping's observations, it seems the conversion of the catalog is properly done and tables are well created by: https://github.com/airbytehq/airbyte/blob/937f85fc12648114fdf17461625bb1e53378e796/airbyte-integrations/connectors/destination-bigquery-denormalized/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDenormalizedDestination.java#L79
However, populating data into the table may not be done properly, so it should be verified if the formatting of the record data is properly done in: https://github.com/airbytehq/airbyte/blob/937f85fc12648114fdf17461625bb1e53378e796/airbyte-integrations/connectors/destination-bigquery-denormalized/src/main/java/io/airbyte/integrations/destination/bigquery/BigQueryDenormalizedRecordConsumer.java#L85
Update.
The current implementation of the BigQuery De-normalized destination doesn't cover complex objects properly. The main problem is that table creation can't properly identify Record structure.
You can find warnings in the log like that:
2021-07-02 00:57:21 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-07-02 00:57:21 [33mWARN[m i.a.i.d.b.BigQueryDenormalizedDestination(getTypes):157 - {} - Field home_type has no type defined, defaulting to STRING
After wrongly created table, processing ignores all incoming values out of the created structure. Errors like that:
2021-07-02 00:57:31 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-07-02 00:57:31 [33mWARN[m i.a.i.d.b.BigQueryDenormalizedRecordConsumer(lambda$formatData$0):92 - {} - Ignoring field action_type as it is not defined in catalog
As summary, the destination requires significant rework. I will keep you posted about this process.
Update. There is a general issue with a destination that stores nested types. There are only a few sources that provide enough metadata for storing that data right. This problem will be fixed out of this issue scope. (Discussed with @sherifnada)
In the case of processing data from Facebook to BigQuery, we have a gap when we process Array of Objects.
@zestyping give the new version a shot and let us know how it goes!
@sherifnada Awesome news. Thanks, will do!
Environment
Current Behavior
In the Facebook Marketing data, there's a record that contains a field named
targeting
, whose value is a structure containing a field namedexclusions
, whose value is a structure containing a field namedinterests
, whose value is an array of{"id": ..., "name": ...}
structures.I can confirm the data is present because it shows up when I use the original, "flattened" BigQuery destination. When I transfer the data using the flattened BigQuery destination, and then query for the record in BigQuery like this:
...BigQuery gives me this JSON result:
Above you can see the contents of the
targeting
field represented as a JSON string, as you would expect from the normalized BigQuery destination.I've repeated the transfer using the new BigQuery Struct destination, putting the structured data in a dataset named
facebook_struct_raw
. When I look at the results, thetargeting.exclusion.interests
array still contains many elements with the{"id": ..., "name": ...}
shape, but they are full of nulls. If I do a similar query on the structured dataset:...BigQuery gives me this:
So it looks like the structure is accurately preserved, but only the top-level fields contain values, and many (but not all!) of the rest contain nulls.
Expected Behavior
The data values should be preserved in the destination along with the original structure. In other words, the output from the above two queries should represent the same information.
If it helps to specify the correctness criterion in a precise way, I'd say:
Logs
logs-486-0.txt
Steps to Reproduce
Are you willing to submit a PR?
I'm willing to help look into it, but I don't know where to start yet.