Closed lucaventurini closed 3 years ago
Hi @lucaventurini
Thanks for trying out TARS and reaching out about this issue. Let me try to explain how to go about both the issues:
TARSClassifier
internally memorizes the structure of the individual tasks it was trained on e.g, "GO_EMOTIONS" is a multi-labelling task, "IMDB" is a multi-class task and so on. In your example, you set the multi_label_threshold
to a small value, but the multi_label
flag is still False, as a result TARS just gives you the class with the highest score. You can use the following code to get the desired results:from flair.models.text_classification_model import TARSClassifier
from flair.data import Sentence
tars = TARSClassifier.load('tars-base')
tars.switch_to_task("IMDB")
sentence = Sentence("I absolutely love this!")
tars.multi_label_threshold = 0.0 #or any other small value
tars.multi_label = True
tars.predict(sentence)
print(sentence)
It would output scores for all the classes:
Sentence: "I absolutely love this !" [− Tokens: 5 − Sentence-Labels: {'label': [positive movie review (0.1077), negative movie review (0.0002)]}]
predict_zero_shot
interface really simple by design to quickly try out ad-hoc set of labels. It always makes the prediction with multi_label_threshold=0.5
regardless of the tars.multi_label_threshold
flag. I would suggest that you add this as a task as shown in the following. Then you would be able to use any threshold you would like.
from flair.models.text_classification_model import TARSClassifier
from flair.data import Sentence
tars = TARSClassifier.load('tars-base') classes = ["positive review", "positive sentiment", "negative review", "negative sentiment"]
tars.add_and_switch_to_new_task("my_task", label_dictionary=classes, multi_label=True, multi_label_threshold=0.0)
sentence = Sentence("I absolutely love this!")
tars.predict(sentence)
print(sentence)
It should output the following:
```python
Sentence: "I absolutely love this !" [− Tokens: 5 − Sentence-Labels: {'label': [negative sentiment (0.0003), negative review (0.0003), positive review (0.1675), positive sentiment (0.0854)]}]
About the last issue of repeating same label multiple times, please make sure you use different sentence objects during calling predict. If you are calling predict on the same sentence object, flair just keeps appending the labels to the same object. If this was not the case, please reach out.
Hope this helps. Let me know if you need anything else.
with regards, Kishaloy
Hi @kishaloyhalder ,
thanks a lot for your detailed answer!
First of all, I confirm that I was able to retrieve the scores in all cases with your code, so thank you.
I am only confused by the choice of the name of this parameter, multi_label
. From your explanation above, it seems to me that it's nothing related with the actual classifier being multilabel or multiclass, it's just an option to retrieve all the scores or just the ones above the set thresholds, true by default for multi_label tasks. Is it like so? If yes, I would rather call it get_all_scores
or something like that.
I say this because in transformers there's a similar multiclass
parameter in the ZSL pipeline (maybe they should have called it multilabel more properly, this is also a bit confusing imho) : https://discuss.huggingface.co/t/new-pipeline-for-zero-shot-text-classification/681 . Setting this to true actually changes the way the probabilities are computed, as you can see in the same discussion, a few posts after the announcement.
So, seeing a similar parameter in Flair, I was expecting a similar behaviour. Actually, could you please elaborate a bit on the differences in the way you handle the multilabel classification, compared to the transformer pipeline? This would help me understand if it fits my tasks better.
Hi @lucaventurini ,
Happy to be able to help!
I understand your confusion. Usually multi-class refers to a classification problem where one out of multiple classes can be true for a data point, and multi-label refers to the tasks where more than one label can be true for a data point (reference). In general, Flair's TextClassifier
class follows this notion, which we respect in TARSClassifier
as well.
About your question regarding how the TARS differentiates between these two tasks, your understanding is correct. As mentioned in our paper, internally TARS treats it as a binary text classification problem for all possible <label, text> combinations, and returns the ones that are above certain multi_label_threshold
(in case of multi_label=True
) or only the one with the highest score (in case of multi_label=False
).
Hope this helps! Feel free to shoot any further questions.
with regards, Kishaloy
Thank you @kishaloyhalder ,
I think we agree on the definitions of multiclass and multilabel. That's why I was surprised by the name of the parameter in transformers
.
So, to recap: internally, TARS is always a multilabel classifier. The multi_label
parameter is just a convenience parameter to get, when it's False, only one label, the one with the highest score above the threshold. There are no other changes in the model or in the way the scores are computed.
I think that this, in theory, should have two corollaries, i.e. that for a given tars model and a given text input:
Correct?
Yes, the two corollaries are correct. In the second case, my suggestion would be to add anything contextual in the label to come up with slightly different labels in natural language. We did one such thing during training the model on Amazon reviews and Yelp reviews. Both have labels 1, 2, 3,..., 5. We converted them to "Positive Product Review", "Very Positive Product Review" etc for Amazon and "Positive Restaurant Review","Very Positive Restaurant Review" etc for Yelp (table 5,6 in reference).
Hope this helps @lucaventurini !
with regards, Kishaloy
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hi!
I have been trying the tutorial for TARS, and noticed some unexpected behaviours.
Usually, when I want to retrieve multilabel scores for all the classes, I do something like
clf.multi_label_threshold = 0.0000001
.If I do it with a pretrained task, sometimes I get the expected behaviour:
sometimes I get only one label:
(btw, this score seems a bit low in this case. but it's not the topic of this issue)
This happens also for the ZS case:
In the case above I didn't get even a single score.
Also, it has happened some times that setting
multilabel=True
, I get some repeated predictions for the majority label (i.e. the class and the score have a duplicated tuple in the array of sentence labels), but this has happened a bit randomly so I cannot show you how to reproduce. Also in this case, I didn't get the scores for the other labels as I had liked.