arunprsh / ChatGPT-Decoded-GPT2-FAQ-Bot-RLHF-PPO

A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback using GPT-2 on AWS
Apache License 2.0
12 stars 4 forks source link

Why randomly add [bad] or [good] tags before the text? #1

Open believeland23 opened 1 year ago

believeland23 commented 1 year ago

Thank you very much for your sharing. It has helped me in my current study and work to a great extent. I observed that in your latest version of PPO training, you will randomly add [bad] or [good] before the problem embedding, and the reverse number will be taken if there is [bad] mark in the calculation of the reward. Could you tell me the reason for this? Why not take the probability of LABEL_1 as a reward? I don't know what the effect of randomly selecting a part of your code to set it as the opposite number will be here. I am looking forward to your answer. Thanks a lot.

1 2
muou55555 commented 1 year ago

the same questions.