Manage temp tensor files in memory rather than sending them to storage

activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

https://activeloop.ai

Mozilla Public License 2.0

8.08k stars 616 forks source link

Manage temp tensor files in memory rather than sending them to storage #2819

Open nvoxland-al opened 6 months ago

nvoxland-al commented 6 months ago

🚀 🚀 Pull Request

Impact

[X] Bug fix (non-breaking change which fixes expected existing functionality)
[ ] Enhancement/New feature (adds functionality without impacting existing logic)
[ ] Breaking change (fix or feature that would cause existing functionality to change)

Description

With a large number of temp tensors, the on-disk metadata management gets time consuming. This PR avoids the overhead by keeping them in-memory.

Things to be aware of

Does not attempt to limit the temp tensor cache, but they are currently only used for class_labels which will not be large amounts of data

nvoxland-al commented 6 months ago

Currently does not work with scheduler=processed. Going to get feedback before looking at handling that better.

codecov[bot] commented 6 months ago

Codecov Report

Attention: Patch coverage is `96.03175%` with `5 lines` in your changes are missing coverage. Please review.	Files	Patch %
deeplake/core/storage/provider.py	94.44%	3 Missing :warning:
deeplake/core/storage/local.py	92.30%	1 Missing :warning:
deeplake/core/storage/lru_cache.py	90.90%	1 Missing :warning:

:loudspeaker: Thoughts on this report? Let us know!

sonarcloud[bot] commented 5 months ago

Quality Gate passed

Issues
4 New issues
0 Accepted issues

Measures
0 Security Hotspots
95.4% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud