Why was your Verifier trained at same time ?

DavideHe commented 2 weeks ago

as the article : https://sieunpark77.medium.com/a-late-review-of-openais-training-verifiers-to-solve-math-word-problems-0d457eb706e3 For each training problem, we sample 100 completions from the generator and label each solution as correct or incorrect as the words , I think Verifier and Generator may be optimized with same model ,but trained at different time . in the loss of code,lm_loss + classifier_loss will calculate at the same time . How does the Verifier trained on 100 samples from the generator and how to label the samples?

PolarisRisingWar commented 2 weeks ago

how to train: you can see code verifier2, it generates training samples for verifier3. Just use test.py to sample samples.how to label: also can see in verifier2. the labels can directly be gotten by labels of GSM8K training samples.If the result generated by generator is same as result in GSM8K training example, the label of this sample is true.You can see in the verifier paper that this process only consider the correctness of the final result. So we can directly use the correctness of the result to label the dataset.But these methods are not confirmed by OpenAI official.仅供参考-------- 原始邮件 --------发件人： DavideHe @.>日期： 2024年6月19日周三 20:29收件人： PolarisRisingWar/Math_Word_Problem_Collection @.>抄送： Subscribed @.***>主题： [PolarisRisingWar/Math_Word_Problem_Collection] Why was your Verifier trained at same time ? (Issue #1) as the article : https://sieunpark77.medium.com/a-late-review-of-openais-training-verifiers-to-solve-math-word-problems-0d457eb706e3 For each training problem, we sample 100 completions from the generator and label each solution as correct or incorrect as the words , I think Verifier and Generator may be optimized with same model ,but trained at different time . in the loss of code,lm_loss + classifier_loss will calculate at the same time . How does the Verifier trained on 100 samples from the generator and how to label the samples?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

DavideHe commented 2 weeks ago

step1: finetune a model base on base model step2: generate some samples with finetuned model, and label the samples true or false from generator. step3: as verifier3 , the model will be trained with samples.
If verifier is just trained only with true label. will it be a good verifier？ your loss is sum of next token loss and classfier loss.

PolarisRisingWar commented 2 weeks ago

If the result generated by generator is false, it will be the false label.So the training dataset of verifier contains false samples.-------- 原始邮件 --------发件人： DavideHe @.>日期： 2024年6月19日周三 20:54收件人： PolarisRisingWar/Math_Word_Problem_Collection @.>抄送： Wang Huijuan @.>, Comment @.>主题： Re: [PolarisRisingWar/Math_Word_Problem_Collection] Why was your Verifier trained at same time ? (Issue #1) step1: finetune a model base on base model step2: generate some samples with finetuned model, and label the samples true or false from generator. step3: as verifier3 , the model will be trained with samples. If verifier is just trained only with true label. will it be a good verifier？

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

DavideHe commented 2 weeks ago

If the result generated by generator is false, it will be the false label.So the training dataset of verifier contains false samples.-------- 原始邮件 --------发件人： DavideHe @.>日期： 2024年6月19日周三 20:54收件人： PolarisRisingWar/Math_Word_Problem_Collection @.>抄送： Wang Huijuan @.>, Comment @.>主题： Re: [PolarisRisingWar/Math_Word_Problem_Collection] Why was your Verifier trained at same time ? (Issue #1) step1: finetune a model base on base model step2: generate some samples with finetuned model, and label the samples true or false from generator. step3: as verifier3 , the model will be trained with samples. If verifier is just trained only with true label. will it be a good verifier？ —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

as your words, false label data will be calculate the next-token loss ? because the code of loss show final loss is sum of next-token loss and classifier loss .The false case will feed to the model for next-token loss generally.

PolarisRisingWar / Math_Word_Problem_Collection

Why was your Verifier trained at same time ? #1