apache / libcloud

Apache Libcloud is a Python library which hides differences between different cloud provider APIs and allows you to manage different cloud resources through a unified and easy to use API.
https://libcloud.apache.org
Apache License 2.0
2.04k stars 926 forks source link

MinIO: ObjectHashMismatchError when trying to upload object #1607

Open Wenzel opened 3 years ago

Wenzel commented 3 years ago

Summary

The MinIO provider raises an ObjectHashMismatchError when trying to upload an object, for no clear reason 100% reproducible, code to repro the bug is provided

Detailed Information

libcloud: latest stable, 3.3.1 Python: 3.8.10 OS: Ubuntu 20.04

Here you can find a repo to repro the bug, using pytest (Actually there are 2 bugs to report): https://github.com/Wenzel/libcloud_bug

$ pytest -k test_demo_ObjectHashMismatchError_with_pyfakefs
clean_minio_db = <libcloud.storage.drivers.minio.MinIOStorageDriver object at 0x7f1410f3fdf0>, fs = <pyfakefs.fake_filesystem.FakeFilesystem object at 0x7f141021fa90>

    def test_demo_ObjectHashMismatchError_with_pyfakefs(clean_minio_db, fs):
        # create test file
        test_file = "/file1.txt"
        test_file_data = b"hello"
        with open(test_file, "wb") as f:
            f.write(test_file_data)
        # create container
        driver = clean_minio_db
        container = driver.create_container('test')
        # test
>       driver.upload_object(str(test_file), container, 'test_file')

/home/wenzel/Projets/libcloud_bug/tests/test_demo_bug.py:13: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/home/wenzel/.cache/pypoetry/virtualenvs/libcloud-bug-IT-EyGKG-py3.8/lib/python3.8/site-packages/libcloud/storage/drivers/s3.py:545: in upload_object
    return self._put_object(container=container, object_name=object_name,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <libcloud.storage.drivers.minio.MinIOStorageDriver object at 0x7f1410f3fdf0>, container = <Container: name=test, provider=MinIO Storage Driver>, object_name = 'test_file', method = 'PUT'
query_args = None, extra = {}, file_path = '/file1.txt', stream = None, verify_hash = True, storage_class = None, headers = {'connection': 'close', 'content-type': 'text/plain; charset=utf-8'}

    def _put_object(self, container, object_name, method='PUT',
                    query_args=None, extra=None, file_path=None,
                    stream=None, verify_hash=True, storage_class=None,
                    headers=None):
        headers = headers or {}
        extra = extra or {}

        headers.update(self._to_storage_class_headers(storage_class))

        content_type = extra.get('content_type', None)
        meta_data = extra.get('meta_data', None)
        acl = extra.get('acl', None)

        if meta_data:
            for key, value in list(meta_data.items()):
                key = self.http_vendor_prefix + '-meta-%s' % (key)
                headers[key] = value

        if acl:
            headers[self.http_vendor_prefix + '-acl'] = acl

        request_path = self._get_object_path(container, object_name)

        if query_args:
            request_path = '?'.join((request_path, query_args))

        result_dict = self._upload_object(
            object_name=object_name, content_type=content_type,
            request_path=request_path, request_method=method,
            headers=headers, file_path=file_path, stream=stream)

        response = result_dict['response']
        bytes_transferred = result_dict['bytes_transferred']
        headers = response.headers
        response = response
        server_hash = headers.get('etag', '').replace('"', '')
        server_side_encryption = headers.get('x-amz-server-side-encryption',
                                             None)
        aws_kms_encryption = (server_side_encryption == 'aws:kms')
        hash_matches = (result_dict['data_hash'] == server_hash)

        # NOTE: If AWS KMS server side encryption is enabled, ETag won't
        # contain object MD5 digest so we skip the checksum check
        # See https://docs.aws.amazon.com/AmazonS3/latest/API
        # /RESTCommonResponseHeaders.html
        # and https://github.com/apache/libcloud/issues/1401
        # for details
        if verify_hash and not aws_kms_encryption and not hash_matches:
>           raise ObjectHashMismatchError(
                value='MD5 hash {0} checksum does not match {1}'.format(
                    server_hash, result_dict['data_hash']),
E                   libcloud.storage.types.ObjectHashMismatchError: <ObjectHashMismatchError in <libcloud.storage.drivers.minio.MinIOStorageDriver object at 0x7f1410f3fdf0>, value=MD5 hash  checksum does not match 5d41402abc4b2a76b9719d911017c592, object = test_file>

/home/wenzel/.cache/pypoetry/virtualenvs/libcloud-bug-IT-EyGKG-py3.8/lib/python3.8/site-packages/libcloud/storage/drivers/s3.py:922: ObjectHashMismatchError
------------------------------------------------------------------------------------------- Captured stdout setup -------------------------------------------------------------------------------------------
db2dfd078b96e1b0932dc354926aadb77adba2d8c19b8ccc875e7fe5163e8f46
----------------------------------------------------------------------------------------- Captured stdout teardown ------------------------------------------------------------------------------------------
libcloud_bug_objectmistmatch_miniodb
========================================================================================== short test summary info ==========================================================================================
FAILED tests/test_demo_bug.py::test_demo_ObjectHashMismatchError_with_pyfakefs - libcloud.storage.types.ObjectHashMismatchError: <ObjectHashMismatchError in <libcloud.storage.drivers.minio.MinIOStorageD..

It looks like there is a hash mismatch when upload_object is verifying the hash, but for no clear reason.

Thanks for maintaining libcloud !

stale[bot] commented 2 years ago

Thanks for contributing to this issue. As it has been 90 days since the last activity, we are automatically marking is as stale. If this issue is not relevant or applicable anymore (problem has been fixed in a new version or similar), please close the issue or let us know so we can close it. On the contrary, if the issue is still relevant, there is nothing you need to do, but if you have any additional details or context which would help us when working on this issue, please include it as a comment to this issue.