The process of loading the dataset via deeplake.load('hub://crossvivit/SunLake') is experiencing significant delays.

activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

https://activeloop.ai

Mozilla Public License 2.0

7.87k stars 605 forks source link

The process of loading the dataset via deeplake.load('hub://crossvivit/SunLake') is experiencing significant delays. #2874

Closed liujian123223 closed 6 days ago

liujian123223 commented 3 weeks ago

Hello, during the execution of the CrossViT project, I am importing data through the use of deeplake.load('hub://crossvivit/SunLake'), but this process takes up to an hour, which is exceedingly slow. I would like to know if there is a way to expedite the dataset import? Alternatively, is it possible to download the SunLake dataset locally for use?Looking forward to your reply.

mikayelh commented 6 days ago

closing this as its been addressed in the slack community.

mikayelh commented 6 days ago

Hi. The issue is likely due to the large number of tensors in the dataset, which is 300. Our servers will not communicate very fast with China, and the problem with loading is made much worse when there are many tensors. Is the a possibility for you to store the data in your own cloud (Azure or AWS) in Asia? If not, would you be able to combine some of the tensors into a json tensors with multiple keys? This would likely make the loading faster, but you'll still experience other slowdowns due to geographic location.