AllenInstitute / abc_atlas_access

Documentation and examples demonstrating how to access data from the Allen Brain Cell Atlas
https://alleninstitute.github.io/abc_atlas_access/
Other
54 stars 22 forks source link

Create initial S3 cache objects #36

Closed morriscb closed 10 months ago

morriscb commented 10 months ago

Port base classes BasicLocalCache and CloudCacheBase from the SDK. Port S3Cloud cache object. Modify manifest comparison to not rely on hashing (file hashes not available in ABC manifests).

I'm tempted to just remove all of the "local_manifest" infrastructure due to the lack of hashes in the downloaded manifests but that's a question we can answer in the review.

The cache works and I've tested all the methods. To be clear this isn't the final object that users will interact with. Though we can chat about that as well. Basically the question is if we want to also have LocalCache and StaticLocalCache as in the SDK: https://github.com/AllenInstitute/AllenSDK/blob/a9b5c685396126d9748f1ccecf7c00f440569f69/allensdk/api/cloud_cache/cloud_cache.py#L1218

danielsf commented 10 months ago

Do you plan to port any of the other cloud cache unit tests over from the SDK?

https://github.com/AllenInstitute/AllenSDK/tree/master/allensdk/test/api/cloud_cache

We did a pretty good job testing the various cache classes using the moto3 mock boto3 API. I think I would prefer having those tests here, but I know it will be a lot of work given the difference in manifest schemas between the SDK use cases and ABC atlas. Feel free to push back.

danielsf commented 10 months ago

Regarding LocalCache and StaticLocalCache. We can probably get away with not having StaticLocalCache. I think that was for a use case where users were running in the cloud and had mounted an S3 bucket like a file system. I do think we need LocalCache, in case someone wants to work with this data in an environment where they don't have network connectivity (or maybe the world has moved on and everyone just pays for WiFi on planes now....)

morriscb commented 10 months ago

Regarding LocalCache and StaticLocalCache. We can probably get away with not having StaticLocalCache. I think that was for a use case where users were running in the cloud and had mounted an S3 bucket like a file system. I do think we need LocalCache, in case someone wants to work with this data in an environment where they don't have network connectivity (or maybe the world has moved on and everyone just pays for WiFi on planes now....)

I agree. The directory structure of ABC matches that of the bucket so we don't really need an extra class beyond LocalCache.

morriscb commented 10 months ago

Do you plan to port any of the other cloud cache unit tests over from the SDK?

https://github.com/AllenInstitute/AllenSDK/tree/master/allensdk/test/api/cloud_cache

We did a pretty good job testing the various cache classes using the moto3 mock boto3 API. I think I would prefer having those tests here, but I know it will be a lot of work given the difference in manifest schemas between the SDK use cases and ABC atlas. Feel free to push back.

Yep. I mostly wanted to get this initial commit looked over because, as you said, adding the tests will be a bit of work so I'm breaking it up in my mind to do this first and then port the tests.