Source TikTok Marketing: Incorrect PK resulting in deleted data

MatthieuColinBM commented 1 year ago

Environment

Airbyte version: 0.39.42-alpha
OS Version / Instance:
Deployment:
Source Connector and version: TikTok Marketing 1.0.0
Destination Connector and version: BigQuery 1.1.11
Step where error happened: Run (Append + Dedup Histo)

Current Behavior

When using the TikTok marketing Source, for every <object>_daily_reports using Incremental | Dedup + History mode we only get one line x <object>_id on the denormalized table. ex. On the stream ad_groups_reports_daily, the PK configured is adgroup_id. The denormalization process creates a table called ad_groups_reports_daily. The resulting data, after a few days of run is 1 line per adgroup_id The result is to have only partial data, the last available point of data for a given adgroup instead of all of the data points available for this adgroup.

This behavior is the same for every layer on the TikTok data (ie. advertisers, campaigns, adgroups, and ads) for every xxx_report_daily. We did not had the opportunity to test the xxx_report_hourly and xxx_report_lifetime and neither the audience reports.

Expected Behavior

We would think to have 1 line x <object> x stat_time_day

ex. On the same scenario, we would think to have 1 line per adgroup_id x stat_time_day

Possible Solution

One possible solution could be to add the stat_time_day (a composite one) as a primary key that will be used on the dedup process. (ie. <object>_id and stat_time_day as a PK) Here is an exemple of the BingAds Connector that have composite PK for report streams wich should have the exact same behavior (almost all media platform are working the same way, with a X day window where the data can be updated) :

Logs

-

Steps to Reproduce

Create a Connection using TikTok marketing as source and BQ as a destination
Enable the ad_groups_reports_daily stream and put it in Incremental dedup + history mode.
Run the connection for at least 2 days
Take a look at the resulting data (denormalized one).

Are you willing to submit a PR?

I don't feel I would be able to 😄

rach-r commented 1 year ago

Also mentioned in the Slack channel here

grubberr commented 1 year ago

This problem was solved in this PR https://github.com/airbytehq/airbyte/pull/24630

airbytehq / airbyte