activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
https://activeloop.ai
Mozilla Public License 2.0
7.87k stars 605 forks source link

The process of loading the dataset via deeplake.load('hub://crossvivit/SunLake') is experiencing significant delays. #2874

Closed liujian123223 closed 6 days ago

liujian123223 commented 3 weeks ago

Hello, during the execution of the CrossViT project, I am importing data through the use of deeplake.load('hub://crossvivit/SunLake'), but this process takes up to an hour, which is exceedingly slow. I would like to know if there is a way to expedite the dataset import? Alternatively, is it possible to download the SunLake dataset locally for use?Looking forward to your reply.

mikayelh commented 6 days ago

closing this as its been addressed in the slack community.

mikayelh commented 6 days ago

Hi. The issue is likely due to the large number of tensors in the dataset, which is 300. ​ Our servers will not communicate very fast with China, and the problem with loading is made much worse when there are many tensors. ​ Is the a possibility for you to store the data in your own cloud (Azure or AWS) in Asia? ​ If not, would you be able to combine some of the tensors into a json tensors with multiple keys? This would likely make the loading faster, but you'll still experience other slowdowns due to geographic location.