Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
610 stars 77 forks source link

Class encode labels column #176

Closed ghomasHudson closed 3 years ago

ghomasHudson commented 3 years ago

I think this is the fix needed for #154. Huggingface datasets now has a class_encode_column function which converts a column to ClassLabel which allows us to find the num_classes.

codecov[bot] commented 3 years ago

Codecov Report

Merging #176 (57785b4) into master (d075905) will increase coverage by 20.92%. The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #176       +/-   ##
===========================================
+ Coverage   69.18%   90.11%   +20.92%     
===========================================
  Files          70       71        +1     
  Lines        1467     1537       +70     
===========================================
+ Hits         1015     1385      +370     
+ Misses        452      152      -300     
Flag Coverage Δ
unittests 90.11% <66.66%> (+20.92%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
..._transformers/task/nlp/text_classification/data.py 93.54% <66.66%> (-3.01%) :arrow_down:
...sk/nlp/question_answering/datasets/squad/metric.py 100.00% <0.00%> (ø)
lightning_transformers/core/nlp/seq2seq/model.py 94.73% <0.00%> (+2.63%) :arrow_up:
lightning_transformers/core/data.py 100.00% <0.00%> (+3.44%) :arrow_up:
lightning_transformers/core/nlp/seq2seq/data.py 96.29% <0.00%> (+7.40%) :arrow_up:
...g_transformers/task/nlp/language_modeling/model.py 100.00% <0.00%> (+8.33%) :arrow_up:
...formers/task/nlp/masked_language_modeling/model.py 100.00% <0.00%> (+8.33%) :arrow_up:
...ng_transformers/task/nlp/language_modeling/data.py 92.85% <0.00%> (+16.66%) :arrow_up:
...ansformers/task/nlp/summarization/datasets/xsum.py 100.00% <0.00%> (+16.66%) :arrow_up:
...ning_transformers/task/nlp/multiple_choice/data.py 94.11% <0.00%> (+17.64%) :arrow_up:
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d075905...57785b4. Read the comment docs.

ghomasHudson commented 3 years ago

Working now and should finally allow the use of custom files for classification which is badly needed to make this project useful for the purposes described in the documentation.

Something to think about long term is matching our tasks with the huggingface dataset task templates (i.e. nlp/text_classification is the same as huggingface's text-classification) which allows you to do dataset.prepare_for_task("text-classification") and the columns will be converted automatically to the correct types (assuming they're given the standard names "text" and "labels").

SeanNaren commented 3 years ago

Interesting I didn't know about the prepare for task function! Thanks for highlighting this :)

Borda commented 3 years ago

Interesting I didn't know about the prepare for task function! Thanks for highlighting this :)

shall we merge it then? cc: @carmocca