KhoomeiK / LlamaGym

Fine-tune LLM agents with online reinforcement learning
MIT License
994 stars 44 forks source link