issues
search
Glavin001
/
PeakProgrammer
Mastering coding precision with fine-tuned reinforcement learning
MIT License
0
stars
0
forks
source link
Initial foundation
#1
Open
Glavin001
opened
1 year ago
Glavin001
commented
1 year ago
[ ] Pluggable fine-grained reward functions
[ ] Reward
[ ] Penalty
[ ] Completion-wise feedback
[ ] Sentence/Sequence-wise feedback
[ ] Token-wise feedback
Resources
https://github.com/allenai/FineGrainedRLHF
https://huggingface.co/teknium/Replit-v2-CodeInstruct-3B
https://github.com/CarperAI/trlx/blob/main/examples/experiments/grounded_program_synthesis/train_trlx.py#L35-L53
https://wandb.ai/carperai/summarize_RLHF/reports/Implementing-RLHF-Learning-to-Summarize-with-trlX--VmlldzozMzAwODM2
Resources