DataBiosphere / ssds

Simple data storage system for AWS and GCP
MIT License
2 stars 1 forks source link

ServiceUnavailable Error On Large Syncs From S3 --> GCP #200

Closed juklucas closed 3 years ago

juklucas commented 3 years ago

While syncing nanopore data from S3 SSDS exited out twice over a period of ~3 days. It seems to be exiting out on large tar files -- but I only have an n-of-2 to go off of.

The error message is below:

INFO:ssds:syncing submissions/e4a08a8d-4b3f-4211-b473-bb9270891037--UCSC_HPRC_nanopore/HG02257/nanopore/HG02257_5.fast5.tar from <SSDS _S3Staging s3://human-pangenomics> to <SSDS _GSStaging gs://fc-4310e737-a388-4a10-8c9e-babe06aaf0cf>
Traceback (most recent call last):
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/cli_builder/__init__.py", line 104, in __call__
    command(args)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/ssds/cli/staging.py", line 82, in sync_command
    for _ in ssds.sync(args.submission_id, src, dst):
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/ssds/__init__.py", line 256, in sync
    writer.put_part(part)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/ssds/blobstore/__init__.py", line 94, in __exit__
    self.close()
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/ssds/blobstore/gs.py", line 161, in close
    self._part_uploader.close()
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/gs_chunked_io/writer.py", line 198, in close
    self._writer.close()
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/gs_chunked_io/writer.py", line 94, in close
    self._compose_dest_blob()
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/gs_chunked_io/writer.py", line 109, in _compose_dest_blob
    part_names = self._sorted_part_names(part_names)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/gs_chunked_io/writer.py", line 136, in _sorted_part_names
    for p in sorted([(name.rsplit(".", 1)[1], name) for name in part_names])]
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/gs_chunked_io/writer.py", line 136, in <listcomp>
    for p in sorted([(name.rsplit(".", 1)[1], name) for name in part_names])]
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/gs_chunked_io/writer.py", line 125, in _compose_parts
    self.bucket.blob(dst_part_name).compose(blobs)
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 2852, in compose
    api_response = client._connection.api_request(
  File "/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/google/cloud/_http.py", line 423, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.ServiceUnavailable: 503 POST https://storage.googleapis.com/storage/v1/b/fc-4310e737-a388-4a10-8c9e-babe06aaf0cf/o/d0a534f9-6dcd-4a12-959d-1d120b20efb3.89c464c5-1552-4f0c-b085-b7f49abc769c.gs-chunked-io-part.004856/compose: Backend Error
xbrianh commented 3 years ago

This should be addressed in v0.0.3, but please reopen this issue, or create a new one, if this error occurs again.

Please install with pip install --upgrade --no-cache-dir git+https://github.com/DataBiosphere/ssds to make to pick up dependency updates.