activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
https://activeloop.ai
Mozilla Public License 2.0
7.87k stars 605 forks source link

move deeplake dataset to _orig_dataset to enable ld.dataset variable … #2855

Closed activesoull closed 1 month ago

activesoull commented 1 month ago

…for integration like mmdet

πŸš€ πŸš€ Pull Request

Impact

fixed the issue related to the enterprise dataloader's dataset property overwriting, as the integrations are using that property in the pipelines

Description

Things to be aware of

Things to worry about

Additional Context

activesoull commented 1 month ago

This was the change which introduced broken shuffling. We reverted it to fix the shuffling. Can you please explain what is this doing?

The issue was that our mmdet integration requires that the dataloader should have the dataloader.dataset property. So the training pipeline will get from the dataloader the dataset MMdetDataset and extract the properties in train/val/test loops moved all the dataset related logics into the internal _orig_dataset , property and we reserve the dataset property for the training frameworks.

so the workflow looks like this

  1. we create dataloader
  2. set the dataloader's dataset property to be MMdetDataset here
  3. and the dataloader was failing here and all the pleas where we wanted to access self.datset as if was changed from deeplake dataset to MMdetDataset object.

will add test case in indra to keep the bug track on shuffling quality check

sonarcloud[bot] commented 1 month ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
84.6% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud