activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
https://activeloop.ai
Mozilla Public License 2.0
8.05k stars 614 forks source link

[FEATURE] Simple data iterator for deeplake.Dataset #2016

Open elda27 opened 1 year ago

elda27 commented 1 year ago

🚨🚨 Feature Request

Is your feature request related to a problem?

The current implementation requires TensorFlow or PyTorch to generate the iterator on the Windows. Of course, I could use deplake.Dataset.dataloader to accomplish something like this question. I would like to provide a simple method that can be done identically in all environments.

For example, I have assumed an implementation to preprocess all data in turn on the CPU using this feature.

To create data similar with the current deeplake would require some conversion process. I assume that all series data is NumPy, and that all other data can be obtained with appropriate types such as str, int, list, etc.

Description of the possible solution

A deeplake.Dataset.tensorflow() includes generator function that yields dictionary of records. I guess customizing its implementation.

An alternative solution to the problem can look like

ds = deeplake.empty("./example")
ds.create_tensor("image", htype="image.rgb")
ds.create_tensor("tags", htype="list")
ds.create_tensor("caption", htype="text")

for dict_of_tensor in ds.numpy():
    print(dict_of_tensor) # {"image": np.ndarray, "tags": list of str, "caption": str}
pyther-hub commented 1 year ago

hey I have solved this issue can I put a pull request

    def dict_record(self):
        from deeplake.enterprise import dataloader
        return iter(map(lambda row: dict(row[0]), dataloader(self).numpy()))

this is the code I have added

tatevikh commented 1 year ago

Hi @pyther-hub, absolutely! Go for it.

pyther-hub commented 1 year ago

Hi @pyther-hub, absolutely! Go for it.

sir I have put a pull request please review it

gulatisukaran commented 1 year ago

Is something still left to be done?

pyther-hub commented 1 year ago

can I do work on this again? @tatevikh