Open tautvydas-v opened 7 months ago
This is already available in connector version 0.3.15 as optional field:
Is it though? Am I missing something? Because we have google sheets 0.3.16 connector version and the only optional field is about converting columns names to be SQL compliant:
I don't see any kind of changes in either 0.3.15 or 0.3.16 source versions regarding row batch size, and source code also has 200 as hardcoded value. Please correct me if I'm wrong though!
My bad! you are right. This is available in version 0.2.17, not sure why it has been removed tough.
Good catch, thanks! Didn't even think about looking into the older versions. Will take a look tomorrow as to why it was removed, maybe there was some sort of justification
The parameter was removed and in place the connector uses a default value instead and increase when there is an exception.
Yes, and what I want to achieve is to make it an input parameter again, since at least in our case, we have quite a lot of google sheets and some of them are very large. Like I described, we had hardcoded batch_row_size
as a much higher value, and it had less retries and was faster, so not sure why this value should be hardcoded.
Connector Name
source-google-sheets
Connector Version
0.3.16
What step the error happened?
During the sync
Relevant information
Currently, source-google-sheets has a default value of 200 for "row_batch_size" variable. We've noticed that we can easily increase this value in order to process more data with one request. Google Sheets API has a limit of 300 requests per project or 60 requests per user per project, and the only limitation for a request is that it has to be processed in under 180 seconds. Otherwise, if a google sheet has a lot of rows, there is a possibility that at some point exponential backoff fails and the whole sync silently fails too.
We've tested out with having this value as 100, 10000 and 150000 and it seems that this connector works the same way, but processes a lot more data. Also it's understandable that maybe someone would like to have this value lower / higher, so suggestion is to have this value as a parameter, which could be set before setting up connector. I'm happy to contribute to this new feature for source-google-sheets.
Relevant log output
Contribute