activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
https://activeloop.ai
Mozilla Public License 2.0
8.1k stars 622 forks source link

[FEATURE] Add tensor.pytorch and tensor.tensorflow methods #1177

Open kristinagrig06 opened 3 years ago

kristinagrig06 commented 3 years ago

🚨🚨 Feature Request

If your feature will improve HUB

Currently, only Datasets can be used to create PyTorch Dataloaders, TensorFlow Datasets. Need to add functionality to convert Hub tensors to PyTorch/TensorFlow compatible format.

Difficulty: Medium

MoritzWillmann commented 3 years ago

Hey @kristinagrig06 @tatevikh, very cool project! Can I collaborate in this issue? I am new to open source but I have experience in software development (also in a team). I also have experience with pytorch and tensorflow.

dhiganthrao commented 3 years ago

Hey @MoritzWillmann, welcome to Hub and Hacktoberfest, and thank you for your willingness to contribute! Do you have a proposed solution in mind for this issue?

MoritzWillmann commented 3 years ago

Hey @dhiganthrao, I checked out some of your code for .numpy() yesterday and I think I'll first implement it similar to that. I read in pytorch forums though that transforming a buffer to torch.Tensor directly can be slow so I'd benchmark it against a call torch.from_numpy(np.frombuffer(...)). Do you have any thoughts on this? I didn't look into tensorflow yet, but I think it'll be similar.

farizrahman4u commented 3 years ago

I think for now .pytorch() and .tensorflow() can call .numpy() underneath. @AbhinavTuli ?

MoritzWillmann commented 3 years ago

Hey @farizrahman4u, I was just about to suggest that. I implemented it both ways yesterday and there's no recognisable performance difference. The "no numpy"-version needs much more implementing work though due to datatypes. Also it seems like there will be a major change in how pytorch handles storage for different datatypes coming up soon. It should get easier then...

farizrahman4u commented 3 years ago

@MoritzWillmann sounds good.

Anshika91 commented 1 year ago

hey i want to solve this issue please assign this issue to me