activeloopai / deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
https://activeloop.ai
Mozilla Public License 2.0
7.99k stars 609 forks source link

[FEATURE] Tutorial notebooks for popular datasets #1172

Open kristinagrig06 opened 2 years ago

kristinagrig06 commented 2 years ago

🚨🚨 Feature Request

If your feature will improve HUB

Create notebooks with training pipelines using popular datasets available in Hub. A list of all datasets from activeloop can be found by running: activeloop list-datasets --workspace activeloop

Difficulty: Easy

Note: If you have a solution to this issue, please make a Pull Request to our Examples Repository and not to this repository!

Eeshaan-Dutt commented 2 years ago

Can we use Open Source Libraries/ Packages for so? I guess I do have an approach in mind.

dhiganthrao commented 2 years ago

Hey @Eeshaan-Dutt, you can use them, but please keep in mind that Hub should be the star of these tutorials - whatever can be done with Hub, should be done!

Anaxagoras7 commented 2 years ago

@dhiganthrao , is this issue closed or available for contribution?

dhiganthrao commented 2 years ago

@Anaxagoras7, there's a PR open for the same, but it hasn't been updated for some time. If you think you have a good solution to this, go for it!

dhiganthrao commented 2 years ago

General update: If you have a solution to this issue, please make a Pull Request to our Examples Repository and not to this repository!

Anaxagoras7 commented 2 years ago

Sure @dhiganthrao

dhiganthrao commented 2 years ago

Hey @Eeshaan-Dutt and @Anaxagoras7! Any updates/questions you want to share?

Anaxagoras7 commented 2 years ago

@dhiganthrao, apologies on the delay, got caught up in something. I wanted to know, if the pipeline is just a demonstration of the various datasets available using ML algos, or am I missing something. And could you please elaborate a little this issue if possible, as I am a bit of a newbie in the Open Source world ? Also, Im not able to access the datasets list using the command listed above too, is there a way to fix that?

dhiganthrao commented 2 years ago

@Anaxagoras7, you can create a Jupyter notebook containing details on how to build an ML pipeline using Hub. An ML pipeline would involve loading the data, preprocessing it, loading an ML/DL model, and training that model on your data. So instead of local data, Hub can be used for the same. You can refer to this example on what it looks like. You don't need to write code for uploading a Hub dataset, but it would be helpful if you do!

Regarding you not able to access the list of datasets, can you please elaborate? It would be helpful if you could upload the error traceback you get when you run the command, for debugging purposes 😄

Feel free to ping me again if you have any questions, and please consider joining our Slack Community for all updates on everything Hub!

Anaxagoras7 commented 2 years ago

Thank you for the help @dhiganthrao . Also I got the problem resolved! Will send a PR shortly.

mikayelh commented 2 years ago

@Anaxagoras7 did you send a PR for this? I had a hard time tracking this down. If not, this issue is still up for grabs in case anyone is interested!

jaivanti commented 2 years ago

Hi! I am just thinking of grabbing this issue....But just wanna recollect and clarify about whatever I understood, Please feel free to correct me: So, basically, I have to make a Jupyter notebook where I need to consider different datasets with any ML algorithm and prepare a pipeline using HUB. So, will I have to consider different different ML algorithms also?

mikayelh commented 2 years ago

it's ok to stick to one model, but the training ideally should happen both with PyTorch and Tensorflow!

jaivanti commented 2 years ago

Alright! I will give it a try then

jaivanti commented 2 years ago

https://colab.research.google.com/drive/13rkYj5qfAn8YdoomNV8fLcH7--gb_vBQ#scrollTo=iKEAxW7FENld This is a mock notebook of ML pipeline I have prepared using CIFAR-10 and HUB for Image Classification using tensorflow... Is this fine to proceed with...Please let me know so i will make more changes

jaivanti commented 2 years ago

This is for pytorch implementation using hub: https://colab.research.google.com/drive/1K1zTX0Xmh8DNKkDhDERK-uX8pf-aLp_5 Do let me know for updates

jaivanti commented 2 years ago

@mikayelh Should i raise the pull request with this work? Like do youall want me to make any changes into it?

mikayelh commented 2 years ago

@jaivanti hi! thanks for following up. @farizrahman4u will review this and get back to you asap (@tatevikh FYI). Thanks a lot for the contribution (upon quick glance looks ok, but @farizrahman4u definitely will have more tips).

Maybe you can add a screenshot to the colab from app.activeloop.ai and say "you can also visualize the dataset at [dataset link]".

jaivanti commented 2 years ago

Thanks @mikayelh for the response! I have added whatever changes you mentioned.

ubergeekNZ commented 2 years ago

I created a docker, hub, tensorboard, jupyter notebook example based on pytorch MNIST example. Wondering if that is of any use.

https://github.com/ubergeekNZ/pytorch_and_hub

@jaivanti hi! thanks for following up. @farizrahman4u will review this and get back to you asap (@tatevikh FYI). Thanks a lot for the contribution (upon quick glance looks ok, but @farizrahman4u definitely will have more tips).

Maybe you can add a screenshot to the colab from app.activeloop.ai and say "you can also visualize the dataset at [dataset link]".

farizrahman4u commented 2 years ago

@jaivanti The notebooks look good, maybe format the cells with black? Also instead of comments, in some places its more appropriate to use text cells.

mikayelh commented 2 years ago

@ubergeekNZ just make sure to call the example "Using Activeloop Hub as a dataloader with Tensorboard & Docker to train a model in PyTorch".

Load mnist data from activeloop.ai hub -> this is Fashion MNIST, and not MNIST. We also refer to hub as either hub or Activeloop Hub (not activeloop.ai hub). Please fix this before we merge it into activeloopai/examples!

jaivanti commented 2 years ago

@farizrahman4u I have added the black extension to the cells and also provided text instead of comments

jaivanti commented 2 years ago

Should I pull a PR for this? I have made most of the changes as prescribed. Thanks

mikayelh commented 2 years ago

Thanks for the ping @jaivanti ! adding @tatevikh to the thread.

farizrahman4u commented 2 years ago

@jaivanti Sure, go ahead.

brlrb commented 2 years ago

Can multiple people contribute to creating Notebook Tutorials? If yes then I can try this one.

mikayelh commented 2 years ago

yes @brlrb , absolutely. Do you have a tutorial in mind?

brlrb commented 2 years ago

@mikayelh what I had in mind is that I can pick up any dataset that does not have a tutorial or can be improved from https://docs.activeloop.ai/datasets/ and then write a tutorial. One example could be that a dataset can have a tutorial with PyTorch but I can write them in TensorFlow. A couple of questions for you:

I am interested in the NLP dataset and want to work with Hugging Face APIs but I am open to any other that is a priority.

mikayelh commented 2 years ago

@brlrb i'm tagging @istranic who has some ideas re: which tutorials would be more interesting/priority for Hub. Thanks for your ideas!

brlrb commented 2 years ago

@mikayelh @istranic I am still interested in this issue. when I go to https://docs.activeloop.ai/datasets and randomly check on a few of the datasets, I see that it has something that helps us to get started. I am unsure which datasets are popular that does not have tutorials since I do not see a way to filter them.

aryatalathi commented 1 year ago

hey there! I would like to give it a try. May I? Or the issue is solved already?

tatevikh commented 1 year ago

Hi @aryatalathi . Of course! Do you have a tutorial in mind?

aryatalathi commented 1 year ago

Yes, have some thoughts. Can you just give me the approval to work on this issue, so that I can start working on the same?

Anshika91 commented 1 year ago

hey please assign this issue to me i want to solve it