Open 0ptim opened 1 year ago
We could use LangChain+, so we don't need to code everything from scratch.
The idea is, that we need to have a way to track the agent behavior. It's important to try to measure how well he does and in which cases he fails.
We need to be able to run it after making changes to be able to measure the impact and don't introduce any regressions in performance.
The evaluation should be done by a state-of-the-art LLM. For the time being, this would be gpt-4.
gpt-4
We need to:
With this, we'll then:
main-agent
To evaluate the input:
Using LangChain+
We could use LangChain+, so we don't need to code everything from scratch.
Custom solution
The idea is, that we need to have a way to track the agent behavior. It's important to try to measure how well he does and in which cases he fails.
We need to be able to run it after making changes to be able to measure the impact and don't introduce any regressions in performance.
The evaluation should be done by a state-of-the-art LLM. For the time being, this would be
gpt-4
.We need to:
With this, we'll then:
main-agent
to work through those.To evaluate the input:
gpt-4
which will evaluate how well it did.