Write iteratively to batch_objects.jsonl for recovery and catch openai.APITimeoutError

I'm seeing a bunch of the timeouts ^ as we try to upload all the batches: [36m(_Completions pid=125780, ip=10.120.0.7)[0m INFO:openai._base_client:Retrying request to /files in 0.839617 seconds [36m(_Completions pid=125780, ip=10.120.0.7)[0m INFO:openai._base_client:Retrying request to /files in 0.884488 seconds [36m(_Completions pid=125780, ip=10.120.0.7)[0m INFO:openai._base_client:Retrying request to /files in 0.821620 seconds

This is what killed the job openai.APITimeoutError: Request timed out. raise APITimeoutError(request=request) from err File "/tmp/ray/session_2024-11-14_19-11-03_110436_1/runtime_resources/pip/8f9a6c08a6f7b36cef5b248cf848c00d3b8e4aef/virtualenv/lib/python3.10/site-packages/openai/_base_client.py", line 1591, in _request return await self._retry_request( File "/tmp/ray/session_2024-11-14_19-11-03_110436_1/runtime_resources/pip/8f9a6c08a6f7b36cef5b248cf848c00d3b8e4aef/virtualenv/lib/python3.10/site-packages/openai/_base_client.py", line 1581, in _request return await self._retry_request( File "/tmp/ray/session_2024-11-14_19-11-03_110436_1/runtime_resources/pip/8f9a6c08a6f7b36cef5b248cf848c00d3b8e4aef/virtualenv/lib/python3.10/site-packages/openai/_base_client.py", line 1581, in _request return await self._request( File "/tmp/ray/session_2024-11-14_19-11-03_110436_1/runtime_resources/pip/8f9a6c08a6f7b36cef5b248cf848c00d3b8e4aef/virtualenv/lib/python3.10/site-packages/openai/_base_client.py", line 1533, in request return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) File "/tmp/ray/session_2024-11-14_19-11-03_110436_1/runtime_resources/pip/8f9a6c08a6f7b36cef5b248cf848c00d3b8e4aef/virtualenv/lib/python3.10/site-packages/openai/_base_client.py", line 1839, in post return await self._post( File "/tmp/ray/session_2024-11-14_19-11-03_110436_1/runtime_resources/pip/8f9a6c08a6f7b36cef5b248cf848c00d3b8e4aef/virtualenv/lib/python3.10/site-packages/openai/resources/files.py", line 443, in create batch_file_upload = await async_client.files.create(

There are 100+ batches successfully submitted and in the dashboard…. I wonder if we can recover and just use that.

When this happens, no batch_objects.jsonl. In the cache. This should be written during the batch creation. So if some of the batch submission fails then we still have partial and can use that

bespokelabsai / curator

Write iteratively to batch_objects.jsonl for recovery and catch openai.APITimeoutError #120