Might it better to change the loss for PET if one label corresponds to multiple label words?

In the verbalizer, if one label (e.g. Fruit) corresponds to multiple label words (e.g. apple, pear, watermelon), and the prompted sentence is :

This is a [MASK] comment: it tasted good.

the loss function in your current code will boost the probability of predicting all the three words {apple, watermelon, pear} at the [MASK] position. For a PLM, different contexts would results in different probabilities of the three words. For example, "red" appears more likely around "apple" than "pear". If you boost the probability of generating "pear" around "red", it might ruin the knowledge in the PLM to some extent, or on the contrary increases the difficulties during prompt-tuning.

Maybe you can visit this paper to design the loss? https://aclanthology.org/2022.acl-long.158.pdf

HarderThenHarder / transformers_tasks

Might it better to change the loss for PET if one label corresponds to multiple label words? #14