Open karcot1 opened 1 week ago
Hi,
Review of the customer usage showed that the customer had upto 8 separate generate-partitions
executions running in parallel. Given how long it takes to write to GCS (about 4 hrs for 10,000 objects). Each DVT execution is single threaded, so unclear how it raised 503 errors. DVT creates a client for each file as outlined below - which could exacerbate the situation.
When writing a large number of GCS files for partitions, it can take about 1.25 seconds to write each file to GCS. This is awfully slow. The problem is in get_gcs_bucket
function - which creates a http client and a connection to the bucket every time. So if you have 10,000 files, this can take very long. However if the return value from get_gcs_bucket
is cached, then writes to GCS for small files improve dramatically (> 20x).
It may make sense to restructure the gcs_helper
package as a class, perhaps rename it to filsys_helper
- which allows DVT to interact with Cloud Storage or the file system as specified by the user. If the customer specifies a GCS destination (or source), we can save the storage_client.get_bucket('bucket_name')
result in the class so that it can be reused. This will significantly improve performance.
Sundar Mudupalli
When uploading a large number of yamls to GCS (10,000) we are encountering a 503 error. This indicates the connection to GCS was interrupted, and is commonly encountered when uploading a large number of files via python.
The exact error message: 'Request failed with status code', 503, 'Expected one of', <HTTPStatus.OK: 200>
The recommendation with 5xx errors is to retry, and we have some sample code as a reference: https://github.com/googleapis/python-storage/blob/main/samples/snippets/storage_configure_retries.py#L55
The request is to please modify _write_gcs_file in gcs_helper.py to use retries.
Suggestion (based on above sample code):
Ideally, the values (deadline, initial, multiplier, maximum) can be parameterized, so that end users can modify the values to get optimal performance.