HarderThenHarder / transformers_tasks

⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
https://www.zhihu.com/column/c_1451236880973426688
2.12k stars 377 forks source link

Might it better to change the loss for PET if one label corresponds to multiple label words? #14

Open xiningnlp opened 1 year ago

xiningnlp commented 1 year ago

In the verbalizer, if one label (e.g. Fruit) corresponds to multiple label words (e.g. apple, pear, watermelon), and the prompted sentence is :

This is a [MASK] comment: it tasted good.

the loss function in your current code will boost the probability of predicting all the three words {apple, watermelon, pear} at the [MASK] position. For a PLM, different contexts would results in different probabilities of the three words. For example, "red" appears more likely around "apple" than "pear". If you boost the probability of generating "pear" around "red", it might ruin the knowledge in the PLM to some extent, or on the contrary increases the difficulties during prompt-tuning.

Maybe you can visit this paper to design the loss? https://aclanthology.org/2022.acl-long.158.pdf

HarderThenHarder commented 1 year ago

Sounds great, thanks for your suggestion. I'll try it when I'm free.