KhoomeiK / LlamaGym

Fine-tune LLM agents with online reinforcement learning
MIT License
994 stars 44 forks source link

Fix batching, data formatting, action extraction, prompts; add wandb logging #1

Closed KhoomeiK closed 8 months ago