[WIP] Adding support for Q&A Agents and environments

Motivation for this: I want the ability to finetune LLMs on more than just a discrete action space such as Blackjack or other games with a discrete action space. So a few additions were made.

llm_eval.py: since the actions space is basically just text, we need an actual reward system. I've seen similar implementations to this using keyword however I settled on using another (larger)LLM , giving it the task, agent's current state and the goal state and asking it whether the agent is getting closer to the goal state and feedback to the agent. OpenAI function calling/gbnf is used to create structured JSON to generate a numeric reward for a given state
critic_server: this is a llama.cpp server to fill the role as previously mentioned. It uses llama grammars to reliably generate JSON similar to fuction calling. Would recommend using a Larger Model to be the critic. This whole thing is fully optional though and by default OAI API will be used. -(WIP) QA Agent & Env: this is agent that will use the llm_eval, its given question from orca-math 200k and asked to solve them

In Progress things: Also working on the ability from transformer lib LLMs to produce structured JSON similar to llama.cpp. For example, someone could make a strategy game and a gym env and the agent could reliablity place a JSON format to play they game without having to resorting to truncating outputs and all that

Would love to get feed back on all of this

KhoomeiK / LlamaGym

[WIP] Adding support for Q&A Agents and environments #6