GoogleCloudPlatform / cluster-toolkit

Cluster Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy AI/ML and HPC environments on Google Cloud.
Apache License 2.0
188 stars 124 forks source link

Logging to BigQuery can fail if number of rows to insert is too large #3008

Open fdmalone opened 1 week ago

fdmalone commented 1 week ago

Pulled from #2989. I ran into an issue where load_bq.py would fail as the number of rows was too large for insert_rows. My workaround was to batch over slices of the jobs list with a batch size set to 10000 (which I think is the upper limit for insert_rows).

I can send a PR with the patch but I was a bit hesitant given the lack of unit tests for the script and adding those would be a bit of work (faking the client etc)

mr0re1 commented 6 days ago

@fdmalone , your contribution would be very welcome, with or without tests.