facebookresearch / ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.
https://parl.ai
MIT License
10.46k stars 2.09k forks source link

BlenderBotSmall fluency #5084

Open Lkh97 opened 10 months ago

Lkh97 commented 10 months ago

Hi there. I have a question about BlenderBot Small 90M.

I have applied a safety framework to blenderbot small to force safe generations. Now I need to measure the "Fluency" of my generated safe answers. The common practice in this case is to use my generations as a label to a larger model and compute perplexity. I tried the same thing with LLAMA2. However, the calculated perplexities are very high in the range of 400k. I assume the reason is the huge gap between the two model sizes (blenderbot small vs LLAMA2). How do you think I could measure the fluency of my generated answers based on blenderbot small?

mojtaba-komeili commented 10 months ago

I believe there must be something wring in your process. The PPL in that order is unreasonable.