Novartis / cellxgene-gateway

Cellxgene Gateway allows you to use the Cellxgene Server provided by the Chan Zuckerberg Institute (https://github.com/chanzuckerberg/cellxgene) with multiple datasets.
Apache License 2.0
55 stars 32 forks source link

Force reload of s3 file structure on every request #62

Closed arogozhnikov closed 2 years ago

arogozhnikov commented 2 years ago

This prevents from reflecting wrong structure and allows simple incorporation of newly uploaded datasets without need to restart the server

alokito commented 2 years ago

Hi Alex! What do you think about adding a S3_DISABLE_LISTINGS_CACHE environment variable to control this behavior? That way we would have backwards compatibility for people who like the old behavior.

arogozhnikov commented 2 years ago

Why would one need to have non-fresh listing? Imagine you use file browser, but it does not show files that were recently created, but shows the ones that were deleted. You click a file, and get an error...

If you have some use case for the old behavior (or maybe any concerns about new behavior), please let me know, but default s3fs behavior just doesn't seem to fit cellxgene-gateway

alokito commented 2 years ago

Why would one need to have non-fresh listing?

@arogozhnikov the reason is for performance. The page will load slightly faster if it does not need to do a new listing each time. This becomes more important as the size of the S3 bucket increases, and as the number of concurrent users grows. We probably need to keep the cache-refresh parameter for the use case where one adds/deletes files from the bucket. I'll try to get these changes into a release soon once we sort out the details.

arogozhnikov commented 2 years ago

@alokito I see your concern, added this environment variable and by default behavior is backward-compatible.

Can you please review and merge? Thanks!

alokito commented 2 years ago

@arogozhnikov After sleeping on it, I think your use case is the more common one. I'll merge this, but change the variable name to S3_ENABLE_LISTINGS_CACHE and switch the logic before making the release. Hopefully that will work for everyone.