argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.91k stars 368 forks source link

feat: better integration of Argilla with Google Colab #1901

Closed MoritzLaurer closed 1 year ago

MoritzLaurer commented 1 year ago

Is your feature request related to a problem? Please describe.

Active learning is an important feature for data annotation and argilla has a great tutorial for locally running it with small-text: https://docs.argilla.io/en/latest/tutorials/notebooks/training-textclassification-smalltext-activelearning.html The problem: If one wants to use a base- or large-sized transformer in an active learning loop, it would be very slow on a typical CPU and the main way for most people to access to GPUs is via google colab. It would be create if it were possible to use an active learning loop with argilla via google colab.

Describe the solution you'd like

Ideally, one could copy a colab notebook from the argilla docs, only change a few lines of code to input one's one data and run an active learning loop with a colab GPU in the browser.

I'm not sure how difficult this is for argilla and I understand that the elasticsearch dependency makes this more complicated.

Potential options:

  1. Pure google colab option: Login to google drive in colab; install elasticsearch in a personal google drive directory; somehow create a local host on colab for argilla.
  2. Local option with colab as backend: Create the local host with elasticsearch locally on one's own device; use ngrok / fastapi or something like that to locally create a public URL/API; make colab backend interact with the local host via the public API (and/or use argilla on google colab via rg.init with the method api_url=PUBLIC_API)

Describe alternatives you've considered

I don't see other cheap ways for people to make an efficient active learning loop with a GPU than using google colab, since colab is the most established way for people to cheaply access GPUs

Additional context

here are examples from other libraries that enable using a colab GPU as the backend in the browser: the EasyNMT library provides a google colab that creates a FastAPI REST-API that’s hosted via a Colab notebook and you can then run translations via the Colab GPU. https://colab.research.google.com/drive/1kAh_Vt1ipA5-BuoaPX39rCIHFrhpcRpW?usp=sharing ; Or here is a gradio app that runs in the browser via colab: https://colab.research.google.com/drive/18ODkJvyxHutTN0P5APWyGFO_xwNcgHDZ?usp=sharing#scrollTo=e200MmBU2aLT

davidberenstein1957 commented 1 year ago

@frascuchon Since I have frequently been getting this question. I have been thinking about creating a tutorial for using a local deployment in combination with Ngrok to integrate with Colab and the like.

Not sure if you have any more structural direction I could think of?

frascuchon commented 1 year ago

Ngrok proxy could be a bottleneck for the user experience.

A easy way to deploy Argilla like helm charts could simplify the process

frascuchon commented 1 year ago

Refs #1899

MoritzLaurer commented 1 year ago

Update: I've now created a google colab that can run argilla with an active learning loop purely hosted on colab in the browser with a GPU: https://colab.research.google.com/drive/11oTWno3hzgJnip11EcgqEhdpbW1IX-lP?usp=sharing

It's a combination of ngrok and your other tutorials on active learning. There are still some improvements that can be done, but it's working.

Happy to help contributing something like this to your documentation if you find it useful.

dvsrepo commented 1 year ago

Wow!! We would definitely love to have this contribution on our docs! I think the ipynb version can be added here https://github.com/argilla-io/argilla/tree/develop/docs/_source/tutorials/notebooks following the same filename structure as the small-text tutorial, and then in the tutorial include the link directly in the first section inviting users to run this in collab.

Let me or @davidberenstein1957 if you need help with writing/editing the docs !

MoritzLaurer commented 1 year ago

Happy you like it, will make some updates and then contribute it to the notebooks folder

MoritzLaurer commented 1 year ago

Created a pull request to add the colab / tutorial for running argilla with a colab GPU here: https://github.com/argilla-io/argilla/pull/2020 any feedback is welcome

davidberenstein1957 commented 1 year ago

@MoritzLaurer Awesome. thanks for the contribution. I will take a look later.