Closed MoritzLaurer closed 1 year ago
@frascuchon Since I have frequently been getting this question. I have been thinking about creating a tutorial for using a local deployment in combination with Ngrok to integrate with Colab and the like.
Not sure if you have any more structural direction I could think of?
Ngrok proxy could be a bottleneck for the user experience.
A easy way to deploy Argilla like helm charts could simplify the process
Refs #1899
Update: I've now created a google colab that can run argilla with an active learning loop purely hosted on colab in the browser with a GPU: https://colab.research.google.com/drive/11oTWno3hzgJnip11EcgqEhdpbW1IX-lP?usp=sharing
It's a combination of ngrok and your other tutorials on active learning. There are still some improvements that can be done, but it's working.
Happy to help contributing something like this to your documentation if you find it useful.
Wow!! We would definitely love to have this contribution on our docs! I think the ipynb version can be added here https://github.com/argilla-io/argilla/tree/develop/docs/_source/tutorials/notebooks
following the same filename structure as the small-text tutorial, and then in the tutorial include the link directly in the first section inviting users to run this in collab.
Let me or @davidberenstein1957 if you need help with writing/editing the docs !
Happy you like it, will make some updates and then contribute it to the notebooks folder
Created a pull request to add the colab / tutorial for running argilla with a colab GPU here: https://github.com/argilla-io/argilla/pull/2020 any feedback is welcome
@MoritzLaurer Awesome. thanks for the contribution. I will take a look later.
Is your feature request related to a problem? Please describe.
Active learning is an important feature for data annotation and argilla has a great tutorial for locally running it with small-text: https://docs.argilla.io/en/latest/tutorials/notebooks/training-textclassification-smalltext-activelearning.html The problem: If one wants to use a base- or large-sized transformer in an active learning loop, it would be very slow on a typical CPU and the main way for most people to access to GPUs is via google colab. It would be create if it were possible to use an active learning loop with argilla via google colab.
Describe the solution you'd like
Ideally, one could copy a colab notebook from the argilla docs, only change a few lines of code to input one's one data and run an active learning loop with a colab GPU in the browser.
I'm not sure how difficult this is for argilla and I understand that the elasticsearch dependency makes this more complicated.
Potential options:
Describe alternatives you've considered
I don't see other cheap ways for people to make an efficient active learning loop with a GPU than using google colab, since colab is the most established way for people to cheaply access GPUs
Additional context
here are examples from other libraries that enable using a colab GPU as the backend in the browser: the EasyNMT library provides a google colab that creates a FastAPI REST-API that’s hosted via a Colab notebook and you can then run translations via the Colab GPU. https://colab.research.google.com/drive/1kAh_Vt1ipA5-BuoaPX39rCIHFrhpcRpW?usp=sharing ; Or here is a gradio app that runs in the browser via colab: https://colab.research.google.com/drive/18ODkJvyxHutTN0P5APWyGFO_xwNcgHDZ?usp=sharing#scrollTo=e200MmBU2aLT