Open DavideHe opened 2 weeks ago
how to train: you can see code verifier2, it generates training samples for verifier3. Just use test.py to sample samples.how to label: also can see in verifier2. the labels can directly be gotten by labels of GSM8K training samples.If the result generated by generator is same as result in GSM8K training example, the label of this sample is true.You can see in the verifier paper that this process only consider the correctness of the final result. So we can directly use the correctness of the result to label the dataset.But these methods are not confirmed by OpenAI official.仅供参考-------- 原始邮件 --------发件人: DavideHe @.>日期: 2024年6月19日周三 20:29收件人: PolarisRisingWar/Math_Word_Problem_Collection @.>抄送: Subscribed @.***>主 题: [PolarisRisingWar/Math_Word_Problem_Collection] Why was your Verifier trained at same time ? (Issue #1) as the article : https://sieunpark77.medium.com/a-late-review-of-openais-training-verifiers-to-solve-math-word-problems-0d457eb706e3 For each training problem, we sample 100 completions from the generator and label each solution as correct or incorrect as the words , I think Verifier and Generator may be optimized with same model ,but trained at different time . in the loss of code,lm_loss + classifier_loss will calculate at the same time . How does the Verifier trained on 100 samples from the generator and how to label the samples?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>
step1: finetune a model base on base model
step2: generate some samples with finetuned model, and label the samples true or false from generator.
step3: as verifier3 , the model will be trained with samples.
If verifier is just trained only with true label. will it be a good verifier?
your loss is sum of next token loss and classfier loss.
If the result generated by generator is false, it will be the false label.So the training dataset of verifier contains false samples.-------- 原始邮件 --------发件人: DavideHe @.>日期: 2024年6月19日周三 20:54收件人: PolarisRisingWar/Math_Word_Problem_Collection @.>抄送: Wang Huijuan @.>, Comment @.>主 题: Re: [PolarisRisingWar/Math_Word_Problem_Collection] Why was your Verifier trained at same time ? (Issue #1) step1: finetune a model base on base model step2: generate some samples with finetuned model, and label the samples true or false from generator. step3: as verifier3 , the model will be trained with samples. If verifier is just trained only with true label. will it be a good verifier?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
If the result generated by generator is false, it will be the false label.So the training dataset of verifier contains false samples.-------- 原始邮件 --------发件人: DavideHe @.>日期: 2024年6月19日周三 20:54收件人: PolarisRisingWar/Math_Word_Problem_Collection @.>抄送: Wang Huijuan @.>, Comment @.>主 题: Re: [PolarisRisingWar/Math_Word_Problem_Collection] Why was your Verifier trained at same time ? (Issue #1) step1: finetune a model base on base model step2: generate some samples with finetuned model, and label the samples true or false from generator. step3: as verifier3 , the model will be trained with samples. If verifier is just trained only with true label. will it be a good verifier? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
as your words, false label data will be calculate the next-token loss ? because the code of loss show final loss is sum of next-token loss and classifier loss .The false case will feed to the model for next-token loss generally.
as the article : https://sieunpark77.medium.com/a-late-review-of-openais-training-verifiers-to-solve-math-word-problems-0d457eb706e3
For each training problem, we sample 100 completions from the generator and label each solution as correct or incorrect
as the words , I think Verifier and Generator may be optimized with same model ,but trained at different time . in the loss of code,lm_loss + classifier_loss
will calculate at the same time . How does the Verifier trained on100 samples from the generator
and how to label the samples?