KhoomeiK / LlamaGym

Fine-tune LLM agents with online reinforcement learning
MIT License
994 stars 44 forks source link

[WIP] Adding support for Q&A Agents and environments #6

Open danikhan632 opened 8 months ago

danikhan632 commented 8 months ago

Motivation for this: I want the ability to finetune LLMs on more than just a discrete action space such as Blackjack or other games with a discrete action space. So a few additions were made.

In Progress things: Also working on the ability from transformer lib LLMs to produce structured JSON similar to llama.cpp. For example, someone could make a strategy game and a gym env and the agent could reliablity place a JSON format to play they game without having to resorting to truncating outputs and all that

Would love to get feed back on all of this

taliu02 commented 8 months ago

I think this has legitimate uses cases but there's alot of improvements that need to be made especially for the observation space which does seems to work. You could trying using a text observation space but not sure if that would be sampled properly