The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
OS Version / Instance: Debian GNU/Linux 11 (bullseye)
Deployment: Docker
Source Connector and version: SurveyMonkey 0.1.15 (current version)
Destination Connector and version: Azure Blob Storage 0.1.6
Step where error happened: Sync job
Current Behavior
Stream survey_responses does not download all responses when using sync. mode Incremental | Append. The total number of responses received are far lower than the number of responses the SurveyMonkey backend shows us for the various surveys.
Note: I did not try sync. mode Full refresh | Overwrite because we have too many responses.
Expected Behavior
All responses should be downloaded, none should be skipped.
Logs / Insights
I do not provide logs here since there is no error message available. But I can give furhter insights which might be the problem here:
I looked into the source code and it looks to me that class SurveyResponses does not pass the params sort_order and sort_by over to the API method /(collectors|surveys)/{id}/responses/bulk. See here. But this is done e.g. in class SurveyIdshere.
Since this is not done, I guess just a subset of data is returned from the API. I did not test it but guess when adding these parameters, this would fix the issue.
I suggest to add the parameter per_page with 100 (which is the max. value according to the docs) to reduce the number of API calls. (I run several times in the issue that a complete sync. executes too many API calls and the API is blocked for 24 hours... and Airbyte stopps syncing).
Steps to Reproduce
Set up the connector SurveyMonkey with stream survey_responses and sync. mode Incremental | Append
Do a complete sync. of all data
Match the downloaded responses against the number of responses in the SurveyMonkey Backend / Website for your survey.
Environment
Current Behavior
Stream
survey_responses
does not download all responses when using sync. modeIncremental | Append
. The total number of responses received are far lower than the number of responses the SurveyMonkey backend shows us for the various surveys.Note: I did not try sync. mode
Full refresh | Overwrite
because we have too many responses.Expected Behavior
All responses should be downloaded, none should be skipped.
Logs / Insights
I do not provide logs here since there is no error message available. But I can give furhter insights which might be the problem here:
I looked into the source code and it looks to me that class
SurveyResponses
does not pass the paramssort_order
andsort_by
over to the API method/(collectors|surveys)/{id}/responses/bulk
. See here. But this is done e.g. in classSurveyIds
here.Since this is not done, I guess just a subset of data is returned from the API. I did not test it but guess when adding these parameters, this would fix the issue.
I suggest to add the parameter
per_page
with 100 (which is the max. value according to the docs) to reduce the number of API calls. (I run several times in the issue that a complete sync. executes too many API calls and the API is blocked for 24 hours... and Airbyte stopps syncing).Steps to Reproduce
survey_responses
and sync. modeIncremental | Append
Are you willing to submit a PR?
Yes