fsspec / s3fs

S3 Filesystem
http://s3fs.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
872 stars 272 forks source link

Using s3fs with no 'LastModified' key info #162

Open amangarg96 opened 5 years ago

amangarg96 commented 5 years ago

I have a Ceph based Object store with S3, and I am trying to use s3fs.

The metadata of the objects do not seem to have 'LastModified' entry, which was identified using Boto3's s3.Client.head_object(). Here's a sample metadata response of one of the objects

{'ResponseMetadata': {'RequestId': 'tx0000000000000067d7a8a-005c3dce53-489594c6-in-chennai', 'HostId': '', 'HTTPStatusCode': 200, 'HTTPHeaders': {'server': 'openresty/1.13.6.2', 'date': 'Tue, 15 Jan 2019 12:13:07 GMT', 'content-type': 'application/octet-stream', 'accept-ranges': 'bytes', 'etag': '"ed076287532e86365e841e92bfc50d8c"', 'content-length': '12', 'x-amz-request-id': 'tx0000000000000067d7a8a-005c3dce53-489594c6-in-chennai', 'expires': 'Thu, 31 Dec 2037 23:55:55 GMT', 'cache-control': 'max-age=315360000', 'x-e-id': '10.34.34.118', 'x-elb-id': '10.34.34.118', 'x-elb-app-server': '10.34.105.188'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'ContentLength': 12, 'ETag': '"ed076287532e86365e841e92bfc50d8c"', 'CacheControl': 'max-age=315360000', 'ContentType': 'application/octet-stream', 'Expires': datetime.datetime(2037, 12, 31, 23, 55, 55, tzinfo=tzutc()), 'Metadata': {}}

When I try to use s3fs operations like open() [To open a file for reading and writing], it gives the following error

Traceback (most recent call last):
  File "/Users/aman.garg/Downloads/test/s3fs-touch.py", line 11, in <module>
    with fs.open('amantestnotebook/notebook/write_file.txt', 'rb') as f:
  File "/Users/aman.garg/anaconda2/envs/s3contents/lib/python3.7/site-packages/s3fs/core.py", line 315, in open
    s3_additional_kwargs=kw)
  File "/Users/aman.garg/anaconda2/envs/s3contents/lib/python3.7/site-packages/s3fs/core.py", line 1102, in __init__
    info = self.info()
  File "/Users/aman.garg/anaconda2/envs/s3contents/lib/python3.7/site-packages/s3fs/core.py", line 1120, in info
    refresh=refresh, **kwargs)
  File "/Users/aman.garg/anaconda2/envs/s3contents/lib/python3.7/site-packages/s3fs/core.py", line 447, in info
    'LastModified': out['LastModified'],
KeyError: 'LastModified'

Operations like ls(), mkdir(),isfile(), isdir() are working as expected.

Is there a way to use S3FileSystem without the 'LastModified' metadata? What exactly is this LastModified entry used for?

martindurant commented 5 years ago

Agree that this should only be optional information. I believe the line only exists to copy the original data without mutation. You are very welcome to submit a PR in which fields such as this are only referenced if they exist (or otherwise replaced with defaults, where appropriate). I may get to doing this myself, but I cannot promise when.

btw, with #161 (depends on filesystem-spec), this issue may have gone away. That PR will not be merged for some time, but it would be interesting to see if this kind of problem is already fixed "for free".

nic-avant commented 10 months ago

I see that this issue is very old, however this problem has been plaguing me in the context of testing with moto. I genuinely do not know if it's an issue where the mocked objects don't have LastModified attribute or not however I think a simple fix here would be to replace out['LastModified'] with out.get('LastModified', ...). I don't know what a reasonable default is, maybe None would be fine. Would the fsspec maintainers be open to a one-liner PR to implement this change?

martindurant commented 10 months ago

Yes, I think that's reasonable. In general, we should be more OK with details missing except when explicitly doing an operations that depend on it.