[FEATURE] Simple data iterator for deeplake.Dataset

elda27 commented 1 year ago

🚨🚨 Feature Request

[ ] Related to an existing Issue
[x] A new implementation (Improvement, Extension)

Is your feature request related to a problem?

The current implementation requires TensorFlow or PyTorch to generate the iterator on the Windows. Of course, I could use deplake.Dataset.dataloader to accomplish something like this question. I would like to provide a simple method that can be done identically in all environments.

For example, I have assumed an implementation to preprocess all data in turn on the CPU using this feature.

To create data similar with the current deeplake would require some conversion process. I assume that all series data is NumPy, and that all other data can be obtained with appropriate types such as str, int, list, etc.

Description of the possible solution

A deeplake.Dataset.tensorflow() includes generator function that yields dictionary of records. I guess customizing its implementation.

An alternative solution to the problem can look like

ds = deeplake.empty("./example")
ds.create_tensor("image", htype="image.rgb")
ds.create_tensor("tags", htype="list")
ds.create_tensor("caption", htype="text")

for dict_of_tensor in ds.numpy():
    print(dict_of_tensor) # {"image": np.ndarray, "tags": list of str, "caption": str}

pyther-hub commented 1 year ago

hey I have solved this issue can I put a pull request

    def dict_record(self):
        from deeplake.enterprise import dataloader
        return iter(map(lambda row: dict(row[0]), dataloader(self).numpy()))

this is the code I have added

tatevikh commented 1 year ago

Hi @pyther-hub, absolutely! Go for it.

pyther-hub commented 1 year ago

Hi @pyther-hub, absolutely! Go for it.

sir I have put a pull request please review it

gulatisukaran commented 1 year ago

Is something still left to be done?

pyther-hub commented 1 year ago

can I do work on this again? @tatevikh

activeloopai / deeplake