apache / libcloud

Apache Libcloud is a Python library which hides differences between different cloud provider APIs and allows you to manage different cloud resources through a unified and easy to use API.
https://libcloud.apache.org
Apache License 2.0
2.04k stars 925 forks source link

Documentation: Objectstore example is broken / libcloud does seek() on stream #1424

Closed perbu closed 4 years ago

perbu commented 4 years ago

Summary

I'm running the example on https://libcloud.readthedocs.io/en/stable/storage/examples.html that creates a tarfile and uploads it via stream to an objectstore.

I'm using GCS. libcloud tries to do a seek() on the supplied iterator, this fails and the program stops.

Detailed Information

apache-libcloud==2.8.0 Python 3.7.6 macos Catalina

The example in the doc is Python 2, I've rewritten it for python 3. I'm using GCS.

#!/usr/bin/env python
import os
import subprocess
from datetime import datetime

from libcloud.storage.types import Provider, ContainerDoesNotExistError
from libcloud.storage.providers import get_driver

from dotenv import load_dotenv
load_dotenv()

cls = get_driver(Provider.GOOGLE_STORAGE)
driver = cls(os.getenv('GOOGLE_ACCOUNT'),
                  os.getenv('AUTH_TOKEN'),
                  project='foo')

directory = os.getenv('FOLDER')
cmd = 'tar cvzpf - %s' % (directory)

object_name = 'backup-%s.tar.gz' % (datetime.now().strftime('%Y-%m-%d'))
container_name = os.getenv('WORKSPACE')

# Create a container if it doesn't already exist
try:
    container = driver.get_container(container_name=container_name)
except ContainerDoesNotExistError:
    container = driver.create_container(container_name=container_name)

pipe = subprocess.Popen(cmd, bufsize=0, shell=True, stdout=subprocess.PIPE)
return_code = pipe.poll()

print('Uploading object...')

while return_code is None:
    # Compress data in our directory and stream it directly to CF
    obj = container.upload_object_via_stream(iterator=pipe.stdout,
                                             object_name=object_name)
    return_code = pipe.poll()

print('Upload complete, transferred: %s KB' % ((obj.size / 1024)))

This returns the following exception:

Traceback (most recent call last):
  File "./bug.py", line 38, in <module>
    object_name=object_name)
  File "/Users/perbu/.virtualenvs/ar1/lib/python3.7/site-packages/libcloud/storage/base.py", line 159, in upload_object_via_stream
    iterator, self, object_name, extra=extra, **kwargs)
  File "/Users/perbu/.virtualenvs/ar1/lib/python3.7/site-packages/libcloud/storage/drivers/s3.py", line 698, in upload_object_via_stream
    storage_class=ex_storage_class)
  File "/Users/perbu/.virtualenvs/ar1/lib/python3.7/site-packages/libcloud/storage/drivers/s3.py", line 842, in _put_object
    headers=headers, file_path=file_path, stream=stream)
  File "/Users/perbu/.virtualenvs/ar1/lib/python3.7/site-packages/libcloud/storage/base.py", line 627, in _upload_object
    self._get_hash_function())
  File "/Users/perbu/.virtualenvs/ar1/lib/python3.7/site-packages/libcloud/storage/base.py", line 657, in _hash_buffered_stream
    stream.seek(0)
OSError: [Errno 29] Illegal seek

The offending code in libcloud/storage/base.py looks like this:

        if hasattr(stream, '__next__') or hasattr(stream, 'next'):
            # Ensure we start from the begining of a stream in case stream is
            # not at the beginning
            if hasattr(stream, 'seek'):
                stream.seek(0)

I'm not entirely sure why the iterator get "seek". I've been able to work around the issue by creating a SimpleIterator class that only supplies next and then taking the output from Popen.stdout and subclassing it into the SimpleIterator.

If I just comment out the seek(0) everything seems to work.

Thanks for an excellent project. Let me know if you need anything more from me.

Cheers,

Per.

Kami commented 4 years ago

Thanks for reporting this.

Is this issue Python 3 specific?

Having said that, one thing we could do is simply ignore "illegal seek" errors in that place, but I'm not sure that's the correct approach. It may mask real issues.


EDIT: It looks like we indeed don't have a better option (https://bugs.python.org/issue12877) since Python sadly doesn't throw a more specific / better exception in that case (so we can't distinguish if underlying iterator doesn't support seek or it does support it and incorrect seek position is provided).

We could have some specific case for pipes, but that's probably not the most robust approach...