Closed amitbcp closed 5 years ago
(in my understanding as a user of the lib) labels are clearly separated from the document words by a "label" prefix. For exampe: labelpositive The movie was great!
Does that answer the question?
@gwohlgen the question that I want to figure out is, with labelpositive, how is the word "positive" used internally ? Does the word "positive" is converted to it's vector representation using the word embedding or it is categorically encoded ?
Hello @amitbcp labels are one-hot encoded using the Dictionary
class. This is wholly separate from encodings used by the different word embeddings.
Thanks @alanakbik . I just wanted to confirm the encoding as one-hot encoded should work for us. Thanks for the response. I am closing the question as well.
Hello @amitbcp labels are one-hot encoded using the
Dictionary
class. This is wholly separate from encodings used by the different word embeddings.
Hi @alanakbik thanks for the reply. I'm a huge fan of your work and Flair. Do you mind pointing to which portion of the code that contains this one-hot encoding label transformation? May I ask why use Dictionary class when one can use OneHotEncoder from sklearn?
For text classification, the format is follows - label
As per my understanding, the text/document is converted to it's word embedding format . But what about the categorical labels ? i.e. are the labels converted to a vector representation using the word embedding or assigned numbers or hot-encoded or something similar ?
The concern being, it can be possible that the label might not be a part of the corpus over which the word embedding was trained.