Closed nuochenpku closed 1 year ago
please see data/rft
Get it, thanks for your quick response. But I have another question: Do you consider that LLMs could generate the correct answer whilt the intermediate steps are wrong. Do you filter reasoning paths that feeds the above situation?
All paths with wrong calculations are removed based on python eval function. If one path is incorrect and has correct calculation and answer, I have no way to detect and remove it without a process-level reward model.
Thanks a lot!---- Replied Message ----FromZheng @.>Date08/06/2023 18:53 @.> CcJerry @.>@.>SubjectRe: [OFA-Sys/gsm8k-ScRel] Release RFT datasets (Issue #4) All paths with wrong calculations are removed based on python eval function. If one path is incorrect and has correct calculation and answer, I have no way to detect and remove it without a process-level reward model.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/OFA-Sys/gsm8k-ScRel/issues/4#issuecomment-1666813904", "url": "https://github.com/OFA-Sys/gsm8k-ScRel/issues/4#issuecomment-1666813904", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]
Could you please directly release datasets for RFT that contains various reasoning paths?