gallantlab / cottoncandy

sugar for s3
http://gallantlab.github.io/cottoncandy/
BSD 2-Clause "Simplified" License
33 stars 16 forks source link

changes to handling of metadata casing in boto3 (and minio?) make download_raw_array (and probably other things) fail #70

Closed alexhuth closed 5 years ago

alexhuth commented 5 years ago

It seems like there's been a change in the past month or two in either (or both) boto3 or minio (the object store that we are using) that make download_raw_array fail.

here's the error:

In [4]: story = cci.download_raw_array('AA/avatar')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-53e6dd7d17c7> in <module>()
----> 1 story = cci.download_raw_array('AA/avatar')

~/.local/lib/python3.6/site-packages/cottoncandy/utils.py in iremove_root(self, object_name, *args, **kwargs)
    264             object_name = object_name[1:]
    265 
--> 266         return input_function(self, object_name, *args, **kwargs)
    267     return iremove_root
    268 

~/.local/lib/python3.6/site-packages/cottoncandy/interfaces.py in download_raw_array(self, object_name, buffersize, **kwargs)
    631         arraystream = self.download_stream(object_name)
    632 
--> 633         shape = arraystream.metadata['shape']
    634         shape = map(int, shape.split(',')) if shape else ()
    635         dtype = np.dtype(arraystream.metadata['dtype'])

KeyError: 'shape'

the problem is that boto3 is now returning a metadata dictionary that has the first letter of each key capitalized (!). See:

In [5]: debug
> /home/lixiangxu/.local/lib/python3.6/site-packages/cottoncandy/interfaces.py(633)download_raw_array()
    631         arraystream = self.download_stream(object_name)
    632 
--> 633         shape = arraystream.metadata['shape']
    634         shape = map(int, shape.split(',')) if shape else ()
    635         dtype = np.dtype(arraystream.metadata['dtype'])

ipdb> arraystream.metadata
{'Dtype': '<f8', 'Gzip': 'True', 'Order': 'C', 'Shape': '367,95556'}

We did not get this behavior with boto3 version 1.7.33 (on python 3) or boto version 2.48.0 (on python 2), but we do have the problem on boto3 version 1.9.20 (python 3) and boto3 1.10.33 (python 3).

Other reports of the same issue: minio/minio#6471 and boto/boto3#1425

This could be fixed in a few ways, but I think all of them may be ugly. Please advise.

anwarnunez commented 5 years ago

oy....

https://github.com/minio/minio/issues/6471#issuecomment-421844085

seems like there is some reluctance to fix it on the minio side. from what i can gather, the metadata keys provided by the server are not required to be lowercased, but the client should convert them to lowercase.

we can have a function that grabs metadata and cleans it (eg utils.sanitize_metadata()). then we can just wrap object.metadata acess with that function. this wouldn't be there ugliest thing...