Open Glavin001 opened 1 year ago
Maybe add a special kind of comment for step-by-step thinking which can be easily excluded from the resulting code output?
def helloWorld(): <|start_thinking|>I need to use print!<|end_thinking|> print("Hello world!")
To reward step by step thinking I need to train a reward model. See https://docs.argilla.io/en/latest/guides/llms/examples/train-reward-model-rlhf.html
Maybe add a special kind of comment for step-by-step thinking which can be easily excluded from the resulting code output?