enso-org / enso

Enso Analytics is a self-service data prep and analysis platform designed for data teams.
https://ensoanalytics.com
Apache License 2.0
7.39k stars 323 forks source link

Possible improvement: cloud file caching #11439

Open radeusgd opened 3 weeks ago

radeusgd commented 3 weeks ago

Much like the caching for Data.fetch implemented in #11342, we could have a similar solution for when we are reading cloud files.

We cannot re-use the HTTP layer cache directly, because the pre-signed URL used to download the Cloud file is likely to change (because they expire). So the URLs would be different between calls.

But we can actually do better - when downloading the file, we are fetching the get_details endpoint to get the current pre-signed URL for download. That same endpoint could be made to report the file version that is stored in the cloud.

Thus we could implement our own cache that relies on the asset_id and cloud file version. Such cache would only require re-downloading the file if a new version was uploaded, and it will be more 'correct' than TTL-based one, because as soon as the file is updated, the version changes so any future downloads will correctly invalidate the cache entry.

GregoryTravis commented 3 weeks ago

I think this could be implented using LRUCache -- if you agree, feel free to assign this to me.

GregoryTravis commented 3 weeks ago

And it might be worthwhile to implement ETag checking for Data.fetch.

radeusgd commented 3 weeks ago

I think this could be implented using LRUCache -- if you agree, feel free to assign this to me.

Yes, I think we should be able to re-use some of the LRUCache logic, although it may then require to abstract-away the TTL handling. Since the Enso cloud cache should not rely on TTL, but instead of the asset version reported by cloud (in discussion with cloud team if we can get it on the get_file_details endpoint).

radeusgd commented 3 weeks ago

The ticket adding the version number in the cloud endpoint: https://github.com/enso-org/cloud-v2/issues/1568

enso-bot[bot] commented 2 weeks ago

Adam Obuchowicz reports a new STANDUP for the last Wednesday (2024-11-06):

Progress: Tried to implement a different error message when loading module failed, but discovered that actually it is handled, but Ydoc structure mismatch produces no error, just set of empty lines. Proper solution would be parser (and its AST structure) versioning, but I leave that for later. The issue should be solved by deploying new ydoc server in cloud. It should be finished by 2024-11-15.

farmaazon commented 2 weeks ago

The above was meant for #11493