Open kristinagrig06 opened 3 years ago
Can we use Open Source Libraries/ Packages for so? I guess I do have an approach in mind.
Hey @Eeshaan-Dutt, you can use them, but please keep in mind that Hub should be the star of these tutorials - whatever can be done with Hub, should be done!
@dhiganthrao , is this issue closed or available for contribution?
@Anaxagoras7, there's a PR open for the same, but it hasn't been updated for some time. If you think you have a good solution to this, go for it!
General update: If you have a solution to this issue, please make a Pull Request to our Examples Repository and not to this repository!
Sure @dhiganthrao
Hey @Eeshaan-Dutt and @Anaxagoras7! Any updates/questions you want to share?
@dhiganthrao, apologies on the delay, got caught up in something. I wanted to know, if the pipeline is just a demonstration of the various datasets available using ML algos, or am I missing something. And could you please elaborate a little this issue if possible, as I am a bit of a newbie in the Open Source world ? Also, Im not able to access the datasets list using the command listed above too, is there a way to fix that?
@Anaxagoras7, you can create a Jupyter notebook containing details on how to build an ML pipeline using Hub. An ML pipeline would involve loading the data, preprocessing it, loading an ML/DL model, and training that model on your data. So instead of local data, Hub can be used for the same. You can refer to this example on what it looks like. You don't need to write code for uploading a Hub dataset, but it would be helpful if you do!
Regarding you not able to access the list of datasets, can you please elaborate? It would be helpful if you could upload the error traceback you get when you run the command, for debugging purposes 😄
Feel free to ping me again if you have any questions, and please consider joining our Slack Community for all updates on everything Hub!
Thank you for the help @dhiganthrao . Also I got the problem resolved! Will send a PR shortly.
@Anaxagoras7 did you send a PR for this? I had a hard time tracking this down. If not, this issue is still up for grabs in case anyone is interested!
Hi! I am just thinking of grabbing this issue....But just wanna recollect and clarify about whatever I understood, Please feel free to correct me: So, basically, I have to make a Jupyter notebook where I need to consider different datasets with any ML algorithm and prepare a pipeline using HUB. So, will I have to consider different different ML algorithms also?
it's ok to stick to one model, but the training ideally should happen both with PyTorch and Tensorflow!
Alright! I will give it a try then
https://colab.research.google.com/drive/13rkYj5qfAn8YdoomNV8fLcH7--gb_vBQ#scrollTo=iKEAxW7FENld This is a mock notebook of ML pipeline I have prepared using CIFAR-10 and HUB for Image Classification using tensorflow... Is this fine to proceed with...Please let me know so i will make more changes
This is for pytorch implementation using hub: https://colab.research.google.com/drive/1K1zTX0Xmh8DNKkDhDERK-uX8pf-aLp_5 Do let me know for updates
@mikayelh Should i raise the pull request with this work? Like do youall want me to make any changes into it?
@jaivanti hi! thanks for following up. @farizrahman4u will review this and get back to you asap (@tatevikh FYI). Thanks a lot for the contribution (upon quick glance looks ok, but @farizrahman4u definitely will have more tips).
Maybe you can add a screenshot to the colab from app.activeloop.ai and say "you can also visualize the dataset at [dataset link]".
Thanks @mikayelh for the response! I have added whatever changes you mentioned.
I created a docker, hub, tensorboard, jupyter notebook example based on pytorch MNIST example. Wondering if that is of any use.
https://github.com/ubergeekNZ/pytorch_and_hub
@jaivanti hi! thanks for following up. @farizrahman4u will review this and get back to you asap (@tatevikh FYI). Thanks a lot for the contribution (upon quick glance looks ok, but @farizrahman4u definitely will have more tips).
Maybe you can add a screenshot to the colab from app.activeloop.ai and say "you can also visualize the dataset at [dataset link]".
@jaivanti The notebooks look good, maybe format the cells with black? Also instead of comments, in some places its more appropriate to use text cells.
@ubergeekNZ just make sure to call the example "Using Activeloop Hub as a dataloader with Tensorboard & Docker to train a model in PyTorch".
Load mnist data from activeloop.ai hub -> this is Fashion MNIST, and not MNIST. We also refer to hub as either hub or Activeloop Hub (not activeloop.ai hub). Please fix this before we merge it into activeloopai/examples!
@farizrahman4u I have added the black extension to the cells and also provided text instead of comments
Should I pull a PR for this? I have made most of the changes as prescribed. Thanks
Thanks for the ping @jaivanti ! adding @tatevikh to the thread.
@jaivanti Sure, go ahead.
Can multiple people contribute to creating Notebook Tutorials? If yes then I can try this one.
yes @brlrb , absolutely. Do you have a tutorial in mind?
@mikayelh what I had in mind is that I can pick up any dataset
that does not have a tutorial or can be improved from https://docs.activeloop.ai/datasets/ and then write a tutorial. One example could be that a dataset
can have a tutorial with PyTorch
but I can write them in TensorFlow
. A couple of questions for you:
I am interested in the NLP dataset and want to work with Hugging Face APIs but I am open to any other that is a priority.
@brlrb i'm tagging @istranic who has some ideas re: which tutorials would be more interesting/priority for Hub. Thanks for your ideas!
@mikayelh @istranic I am still interested in this issue. when I go to https://docs.activeloop.ai/datasets
and randomly check on a few of the datasets, I see that it has something that helps us to get started. I am unsure which datasets are popular that does not have tutorials since I do not see a way to filter them.
hey there! I would like to give it a try. May I? Or the issue is solved already?
Hi @aryatalathi . Of course! Do you have a tutorial in mind?
Yes, have some thoughts. Can you just give me the approval to work on this issue, so that I can start working on the same?
hey please assign this issue to me i want to solve it
🚨🚨 Feature Request
If your feature will improve
HUB
Create notebooks with training pipelines using popular datasets available in Hub. A list of all datasets from activeloop can be found by running:
activeloop list-datasets --workspace activeloop
Difficulty: Easy
Note: If you have a solution to this issue, please make a Pull Request to our Examples Repository and not to this repository!