flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.98k stars 2.1k forks source link

labels of Text Classification #684

Closed amitbcp closed 5 years ago

amitbcp commented 5 years ago

For text classification, the format is follows - label

As per my understanding, the text/document is converted to it's word embedding format . But what about the categorical labels ? i.e. are the labels converted to a vector representation using the word embedding or assigned numbers or hot-encoded or something similar ?

The concern being, it can be possible that the label might not be a part of the corpus over which the word embedding was trained.

gwohlgen commented 5 years ago

(in my understanding as a user of the lib) labels are clearly separated from the document words by a "label" prefix. For exampe: labelpositive The movie was great!

Does that answer the question?

amitbcp commented 5 years ago

@gwohlgen the question that I want to figure out is, with labelpositive, how is the word "positive" used internally ? Does the word "positive" is converted to it's vector representation using the word embedding or it is categorically encoded ?

alanakbik commented 5 years ago

Hello @amitbcp labels are one-hot encoded using the Dictionary class. This is wholly separate from encodings used by the different word embeddings.

amitbcp commented 5 years ago

Thanks @alanakbik . I just wanted to confirm the encoding as one-hot encoded should work for us. Thanks for the response. I am closing the question as well.

ychong commented 5 years ago

Hello @amitbcp labels are one-hot encoded using the Dictionary class. This is wholly separate from encodings used by the different word embeddings.

Hi @alanakbik thanks for the reply. I'm a huge fan of your work and Flair. Do you mind pointing to which portion of the code that contains this one-hot encoding label transformation? May I ask why use Dictionary class when one can use OneHotEncoder from sklearn?