airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.18k stars 4.14k forks source link

Source SurveyMonkey: Stream survey_responses delivers incomplete results #25108

Closed leo-schick closed 1 year ago

leo-schick commented 1 year ago

Environment

Current Behavior

Stream survey_responses does not download all responses when using sync. mode Incremental | Append. The total number of responses received are far lower than the number of responses the SurveyMonkey backend shows us for the various surveys.

Note: I did not try sync. mode Full refresh | Overwrite because we have too many responses.

Expected Behavior

All responses should be downloaded, none should be skipped.

Logs / Insights

I do not provide logs here since there is no error message available. But I can give furhter insights which might be the problem here:

I looked into the source code and it looks to me that class SurveyResponses does not pass the params sort_order and sort_by over to the API method /(collectors|surveys)/{id}/responses/bulk. See here. But this is done e.g. in class SurveyIds here.

Since this is not done, I guess just a subset of data is returned from the API. I did not test it but guess when adding these parameters, this would fix the issue.

I suggest to add the parameter per_page with 100 (which is the max. value according to the docs) to reduce the number of API calls. (I run several times in the issue that a complete sync. executes too many API calls and the API is blocked for 24 hours... and Airbyte stopps syncing).

Steps to Reproduce

  1. Set up the connector SurveyMonkey with stream survey_responses and sync. mode Incremental | Append
  2. Do a complete sync. of all data
  3. Match the downloaded responses against the number of responses in the SurveyMonkey Backend / Website for your survey.

Are you willing to submit a PR?

Yes

leo-schick commented 1 year ago

Ping @marcosmarxm

leo-schick commented 1 year ago

fxied by #25109