Open danikhan632 opened 8 months ago
I think this has legitimate uses cases but there's alot of improvements that need to be made especially for the observation space which does seems to work. You could trying using a text observation space but not sure if that would be sampled properly
Motivation for this: I want the ability to finetune LLMs on more than just a discrete action space such as Blackjack or other games with a discrete action space. So a few additions were made.
In Progress things: Also working on the ability from transformer lib LLMs to produce structured JSON similar to llama.cpp. For example, someone could make a strategy game and a gym env and the agent could reliablity place a JSON format to play they game without having to resorting to truncating outputs and all that
Would love to get feed back on all of this