airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.47k stars 3.99k forks source link

[source-google-analytics-data-api] not taking multiple property Ids #42464

Open ali-al-najjar opened 1 month ago

ali-al-najjar commented 1 month ago

Connector Name

source-google-analytics-data-api

Connector Version

2.4.13

What step the error happened?

During the sync

Relevant information

I added multiple property ids as an array in the property ids field. The connector extracts just the first one. So for example these two ids, 1738294 and 5729978930 it will get the data for 1738294. If I switch between them it will get data for the 5729978930 and not the other. I believe there is an issue with the for loop for the property ids. Last point that this is not something new, I checked all the versions of the source and tested them and nothing was working for all the ids.

Relevant log output

No response

Contribute

marcosmarxm commented 1 month ago

@airbytehq/dev-python can someone take a look into this?

girarda commented 1 month ago

@darynaishchenko @maxi297 @strosek as incoming OC - can one of you take a look?

strosek commented 1 month ago

Taking a look ...

alenoir commented 1 month ago

Hi, any updates on this issue? We’re still experiencing the same problem with multiple property_id values. Thanks!

aldogonzalez8 commented 1 month ago

@alenoir I will take a look on this. Do you know if there is a connection with this scenario to review?

alenoir commented 1 month ago

"Do you know if there is a connection with this scenario to review?" what do you mean ?

I'm running open source version of Airbyte with airbyte/source-google-analytics-data-api@2.5.0 (same with 2.4.14)

aldogonzalez8 commented 1 month ago

"Do you know if there is a connection with this scenario to review?" what do you mean ?

I'm running open source version of Airbyte with airbyte/source-google-analytics-data-api@2.5.0 (same with 2.4.14)

@alenoir Got it, sorry for the confusion, do you see these other streams with property suffix when you have more than one property? I am testing with 2.5.0 and this works fine.

image image

aldogonzalez8 commented 1 month ago

@alenoir also If you run the read command from your terminal, can you manually add the streams with your property name as a suffix to the catalog streams?

image

alenoir commented 1 month ago

@aldogonzalez8 I did a "review changes" and everything appeared fine thanks! Does this mean that I have one table per property (daily_active_users_propertyxxxxxx1, daily_active_users_propertyxxxxxx2) and that I have to activate all the streams for all the properties? Or are all the streams merged into the "daily_active_users" stream? I use the BigQuery destnation. For the Google Ads stream, all accounts are merged into a single table, is the behavior different?

ali-al-najjar commented 1 month ago

@aldogonzalez8 I just tested the connector from Airbyte's UI and attached is the screenshot however I agree with @alenoir that I expected to behave the same as google ads , google search console and google page speed connectors where we can add comma separated urls or properties and all data will be merged in one table using one stream.

Screenshot 2024-08-13 at 12 01 30 PM
aldogonzalez8 commented 1 month ago

@aldogonzalez8 I did a "review changes" and everything appeared fine thanks! Does this mean that I have one table per property (daily_active_users_propertyxxxxxx1, daily_active_users_propertyxxxxxx2) and that I have to activate all the streams for all the properties? Or are all the streams merged into the "daily_active_users" stream? I use the BigQuery destnation. For the Google Ads stream, all accounts are merged into a single table, is the behavior different?

@alenoir @ali-al-najjar Yes, there is an open ticket to change this behavior in the future here. Regarding your question, I would expect to have separate tables in the destination, but I can try to make a test about this.

aldogonzalez8 commented 4 weeks ago

@aldogonzalez8 I did a "review changes" and everything appeared fine thanks! Does this mean that I have one table per property (daily_active_users_propertyxxxxxx1, daily_active_users_propertyxxxxxx2) and that I have to activate all the streams for all the properties? Or are all the streams merged into the "daily_active_users" stream? I use the BigQuery destnation. For the Google Ads stream, all accounts are merged into a single table, is the behavior different?

@alenoir @ali-al-najjar Yes, there is an open ticket to change this behavior in the future here. Regarding your question, I would expect to have separate tables in the destination, but I can try to make a test about this.

Yes, for now, we should have separate tables for each stream and property.

aldogonzalez8 commented 1 week ago

Sorry, just noticed I shared an internal issue, this is the work that will be done:

As of now, the Google Analytics Data API Source supports multiple input property IDs. To make it work, we produce one stream per property. This means we have ~57 default streams + custom streams, but every time we append a property to the input config, we get 57 more streams. This is better than setting up a connection per property, but for the sake of user experience, we should have a constant number of streams independent of the number of properties. The most obvious way to do it is utilize extra slicing and extend stream schemas with additional column with a property ID as a value.