Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
610 stars 77 forks source link

multilabel/multiclass #141

Closed vvssttkk closed 3 years ago

vvssttkk commented 3 years ago

did the task text classification support multiclass and/or multilabel classification?

SeanNaren commented 3 years ago

Thanks for your issue!

Yes it does! Some of the pre-made HF datasets are multi-class: https://huggingface.co/datasets/emotion#default-1

I'll make the docs clearer to reflect this

vvssttkk commented 3 years ago

good so, at the field label i can point some labels separated ,

SeanNaren commented 3 years ago

good so, at the field label i can point some labels separated ,

The field label actually refers to the label for that specific sample, i.e cat dog etc. So at once, the model has to identify one label from a selection of X, if that makes sense!

IF the choices change per sample, have a look at our Multiple Choice Task: https://lightning-transformers.readthedocs.io/en/latest/tasks/nlp/multiple_choice.html

vvssttkk commented 3 years ago

i want to watch more examples of your lib

for now, i have text and labels like "mask", "tesla", "spacex" .. for this task i can use multiple choice or no? if yes, how i can enter manual the fields i need

SeanNaren commented 3 years ago

So for each sample of text, like so:

{
    "text": "Hi I am elon musk",
    "label": "tesla"
},
{
    "text": "Send a rocket into space",
    "label": "spacex"
}

Related to #155 I'll clear up the format of data so it's clearer to people!

RaedShabbir commented 3 years ago

Hello,

To confirm, you currently support multi-class but not multi-label? Can't find any examples in the docs regarding multi-label.

Or is Multi-Label only supported through a custom DataModule?

RaedShabbir commented 3 years ago

@SeanNaren any word on this?

SeanNaren commented 3 years ago

hey @RaedShabbir! by multi-label could you give an example? Chances are it is supported, the docs just need clarification :)

RaedShabbir commented 3 years ago

Thanks for getting back to me @SeanNaren,

I mean one hot encodings as the label vector, so the data would look similar to https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data

In other words each text can have from 0 up to N labels, where N is the total number of labels.

i.e. text: "Blah blah blah" labels: "toxic", "severe"

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.