dchaley / deepcell-imaging

Tools & guidance to scale DeepCell imaging on Google Cloud Batch
8 stars 2 forks source link

Recreate StringIO on each BQ upload try #303

Closed dchaley closed 1 month ago

dchaley commented 1 month ago

Bug report from Brenna:

INFO 2024-08-07T15:59:35.636198214Z Sending data to BigQuery
ERROR 2024-08-07T15:59:37.390670393Z Traceback (most recent call last):
ERROR 2024-08-07T15:59:37.390718867Z File "/deepcell-imaging/scripts/gather-benchmark.py", line 125, in <module>
ERROR 2024-08-07T15:59:37.390735551Z       main()
ERROR 2024-08-07T15:59:37.390743303Z File "/deepcell-imaging/scripts/gather-benchmark.py", line 117, in main
ERROR 2024-08-07T15:59:37.390815279Z      upload_to_bigquery(json_str, bigquery_benchmarking_table, job_config)
ERROR 2024-08-07T15:59:37.390823992Z File "/root/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 336, in wrapped_f
ERROR 2024-08-07T15:59:37.390835764Z       return copy(f, *args, **kw)
ERROR 2024-08-07T15:59:37.390846304Z File "/root/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 475, in __call__
ERROR 2024-08-07T15:59:37.390853364Z       do = self.iter(retry_state=retry_state)
ERROR 2024-08-07T15:59:37.390863200Z File "/root/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 376, in iter
ERROR 2024-08-07T15:59:37.390872460Z       result = action(retry_state)
ERROR 2024-08-07T15:59:37.390940236Z File "/root/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 398, in <lambda>
ERROR 2024-08-07T15:59:37.390947056Z       self._add_action_func(lambda rs: rs.outcome.result())
ERROR 2024-08-07T15:59:37.390956029Z File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
ERROR 2024-08-07T15:59:37.390965061Z       return self.__get_result()
ERROR 2024-08-07T15:59:37.390973284Z File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
ERROR 2024-08-07T15:59:37.390980804Z       raise self._exception
ERROR 2024-08-07T15:59:37.390988277Z File "/root/.local/lib/python3.10/site-packages/tenacity/__init__.py", line 478, in __call__
ERROR 2024-08-07T15:59:37.390995116Z       result = fn(*args, **kwargs)
ERROR 2024-08-07T15:59:37.391001070Z File "/deepcell-imaging/scripts/gather-benchmark.py", line 112, in upload_to_bigquery
ERROR 2024-08-07T15:59:37.391007190Z      load_job = bq_client.load_table_from_file(
ERROR 2024-08-07T15:59:37.391013227Z File "/root/.local/lib/python3.10/site-packages/google/cloud/bigquery/client.py", line 2589, in load_table_from_file
ERROR 2024-08-07T15:59:37.391018756Z       response = self._do_resumable_upload(
ERROR 2024-08-07T15:59:37.391024632Z File "/root/.local/lib/python3.10/site-packages/google/cloud/bigquery/client.py", line 3009, in _do_resumable_upload
ERROR 2024-08-07T15:59:37.391030516Z       upload, transport = self._initiate_resumable_upload(
ERROR 2024-08-07T15:59:37.391037397Z File "/root/.local/lib/python3.10/site-packages/google/cloud/bigquery/client.py", line 3078, in _initiate_resumable_upload
ERROR 2024-08-07T15:59:37.391044138Z       upload.initiate(
ERROR 2024-08-07T15:59:37.391050246Z File "/root/.local/lib/python3.10/site-packages/google/resumable_media/requests/upload.py", line 402, in initiate
ERROR 2024-08-07T15:59:37.391057650Z       method, url, payload, headers = self._prepare_initiate_request(
ERROR 2024-08-07T15:59:37.391073108Z File "/root/.local/lib/python3.10/site-packages/google/resumable_media/_upload.py", line 471, in _prepare_initiate_request
ERROR 2024-08-07T15:59:37.391080039Z       raise ValueError("Stream must be at beginning.")
ERROR 2024-08-07T15:59:37.391086161Z ValueError: Stream must be at beginning.

Currently we create the String IO object and pass it to the function. Seems that it isn't re-seeking to the beginning. We should create an IO object for the input string each time. It should solve the problem … if that is the problem 😅