The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Source Connector and version: TikTok Marketing 1.0.0
Destination Connector and version: BigQuery 1.1.11
Step where error happened: Run (Append + Dedup Histo)
Current Behavior
When using the TikTok marketing Source, for every <object>_daily_reports using Incremental | Dedup + History mode we only get one line x <object>_id on the denormalized table.
ex.
On the stream ad_groups_reports_daily, the PK configured is adgroup_id.
The denormalization process creates a table called ad_groups_reports_daily.
The resulting data, after a few days of run is 1 line per adgroup_id
The result is to have only partial data, the last available point of data for a given adgroup instead of all of the data points available for this adgroup.
This behavior is the same for every layer on the TikTok data (ie. advertisers, campaigns, adgroups, and ads) for every xxx_report_daily. We did not had the opportunity to test the xxx_report_hourly and xxx_report_lifetime and neither the audience reports.
Expected Behavior
We would think to have 1 line x <object> x stat_time_day
ex.
On the same scenario, we would think to have 1 line per adgroup_id x stat_time_day
Possible Solution
One possible solution could be to add the stat_time_day (a composite one) as a primary key that will be used on the dedup process. (ie. <object>_id and stat_time_day as a PK)
Here is an exemple of the BingAds Connector that have composite PK for report streams wich should have the exact same behavior (almost all media platform are working the same way, with a X day window where the data can be updated) :
Logs
-
Steps to Reproduce
Create a Connection using TikTok marketing as source and BQ as a destination
Enable the ad_groups_reports_daily stream and put it in Incremental dedup + history mode.
Run the connection for at least 2 days
Take a look at the resulting data (denormalized one).
Environment
Current Behavior
When using the TikTok marketing Source, for every
<object>_daily_reports
usingIncremental | Dedup + History
mode we only get one line x<object>_id
on the denormalized table. ex. On the streamad_groups_reports_daily
, the PK configured isadgroup_id
. The denormalization process creates a table calledad_groups_reports_daily
. The resulting data, after a few days of run is 1 line peradgroup_id
The result is to have only partial data, the last available point of data for a givenadgroup
instead of all of the data points available for thisadgroup
.This behavior is the same for every layer on the TikTok data (ie. advertisers, campaigns, adgroups, and ads) for every xxx_report_daily. We did not had the opportunity to test the
xxx_report_hourly
andxxx_report_lifetime
and neither the audience reports.Expected Behavior
We would think to have 1 line x
<object>
xstat_time_day
ex. On the same scenario, we would think to have 1 line per
adgroup_id
xstat_time_day
Possible Solution
One possible solution could be to add the
stat_time_day
(a composite one) as a primary key that will be used on the dedup process. (ie.<object>_id
andstat_time_day
as a PK) Here is an exemple of the BingAds Connector that have composite PK for report streams wich should have the exact same behavior (almost all media platform are working the same way, with a X day window where the data can be updated) :Logs
-
Steps to Reproduce
Are you willing to submit a PR?
I don't feel I would be able to 😄