argilla-io / argilla

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
https://docs.argilla.io
Apache License 2.0
3.91k stars 368 forks source link

Add Flair training example to Cookbook #416

Closed dvsrepo closed 3 years ago

dvsrepo commented 3 years ago

Similar to the training example for Hugging Face: https://rubrix.readthedocs.io/en/stable/guides/cookbook.html#Training

sakares commented 3 years ago

Hi @dvsrepo ,

In order to participate hacktoberfest21, can I take this one? Thanks

dvsrepo commented 3 years ago

Of course, feel free to ask questions! I've just assigned it to you

dvsrepo commented 3 years ago

Hi @sakares ,

In this issue, we can focus on TextClassification. The main idea would be:

import rubrix as rb

# 1. Load the dataset from Rubrix (for testing purposes you can log a dataset from Hugging Face see example below)
train_dataset = rb.load("my_dataset")
# we might do some train-test splitting first to create  validation and test sets

# 2. Transform dataset(s) and save as csv
# train dataset is a Pandas dataframe: we might need to transform it into something readable by flair's CSVClassificationCorpus and then save it to csv

# transformed_train_dataset = apply some post-processing
transformed_train_dataset.to_csv('train.csv')

# 3. Read the with CSVClassificationCorpus
from flair.datasets import CSVClassificationCorpus
corpus = CSVClassificationCorpus

# 4. from here you should be able to follow https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_7_TRAINING_A_MODEL.md#training-a-text-classification-model

For testing purposes, you can log a text classification dataset from Hugging Face, see some examples here: https://rubrix.readthedocs.io/en/stable/tutorials/01-huggingface.html#Text-classification-with-the-tweet_eval-dataset-(Emoji-classification)

sakares commented 3 years ago

Hi @dvsrepo

Thanks for point out what I am looking for now.

I have just successfully reproduced works on flair tutorial in https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_7_TRAINING_A_MODEL.md#training-a-text-classification-model with glove DocumentPoolEmbeddings (Just for quick experiment on my local machine without GPU)

and about to search how to convert rubrix dataset into flair.datasets format. Also, I can run the docker-compose up for rubrix annotating system and play around already.

Hope I could PR back in few days.

dvsrepo commented 3 years ago

Thanks so much @sakares, don't hesitate to ask here or open issues to report what you might find along the way.

About the conversion to flair.datasets format, what I mentioned above (export to csv) is kind of hack, ideally it would be cool to create a Flair Dataset directly from Python (pandas, or dictionaries) but could not find anything with a quick exploration of flairs code.

dvsrepo commented 3 years ago

Also, I forgot to mention that if you are working on a Jupyter notebook and would like to share the results as a tutorial that's also cool and I could open a new issue so you could contribute that as a by-product too. We are going to include the authors for each tutorial so we could include your name, links, etc.

sakares commented 3 years ago

Hi @dvsrepo

I am new to Sphinx and after I follow this command cd docs; make html it thrown following errors

Exception occurred:
  File "/opt/homebrew/lib/python3.9/site-packages/nbconvert/exporters/templateexporter.py", line 607, in get_template_names
    raise ValueError('No template sub-directory with name %r found in the following paths:\n\t%s' % (base_template, paths))
ValueError: No template sub-directory with name 'rst' found in the following paths:
    /Users/sakares/Library/Jupyter
    /opt/homebrew/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/share/jupyter
    /usr/local/share/jupyter
    /usr/share/jupyter
The full traceback has been saved in /var/folders/hq/4yhdf6b93rq908q9tt9w07fh0000gn/T/sphinx-err-67_4f1bz.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
make: *** [html] Error 2

Do you have any ideas about this? Thanks!

dvsrepo commented 3 years ago

Hi @sakares ,

Are you running this from the terminal inside the rubrix folder?

sakares commented 3 years ago

Yes, in the rubrix/docs path

dvsrepo commented 3 years ago

Maybe @dcfidalgo can help

dcfidalgo commented 3 years ago

Hm, maybe a long shot, but could you try to install pip install nbconvert==5.6.1 to see if this fixes your issue? following https://stackoverflow.com/questions/62431121/nbconvert-valueerror-no-template-sub-directory-with-name-rst-found-in-the-fo

sakares commented 3 years ago

Thanks, it works! 🙂 I think I probably PR back by this weekend

Current works:

To do:

on going PR #442

dvsrepo commented 3 years ago

This is awesome @sakares !! Thank you! Have a nice weekend

sakares commented 3 years ago

Just finished #442 , please feel free to review/comment if I miss something @dvsrepo Thanks!

dcfidalgo commented 3 years ago

Implemented in #442