Glavin001 / PeakProgrammer

Mastering coding precision with fine-tuned reinforcement learning
MIT License
0 stars 0 forks source link

Reward step by step thinking within special comments #24

Open Glavin001 opened 1 year ago

Glavin001 commented 1 year ago

Maybe add a special kind of comment for step-by-step thinking which can be easily excluded from the resulting code output?

def helloWorld():
<|start_thinking|>I need to use print!<|end_thinking|>
  print("Hello world!")
Glavin001 commented 1 year ago

To reward step by step thinking I need to train a reward model. See https://docs.argilla.io/en/latest/guides/llms/examples/train-reward-model-rlhf.html

Datasets