For functional testing of features relying on LLM calls or LLM tasks, the main challenge is to be able to test with a stubbed LLM environment to be able to control the output of those calls or tasks.
The first step would be to have some kind of "LLM server", so that we can define its ouput, and perform assertion regarding calls that were performed against it.
O11y already has a tool serving that purpose, the llm proxy.
We should check if this could serve as a base for something more developer friendly, and more easily available from FTR test suites.
For functional testing of features relying on LLM calls or LLM tasks, the main challenge is to be able to test with a stubbed LLM environment to be able to control the output of those calls or tasks.
The first step would be to have some kind of "LLM server", so that we can define its ouput, and perform assertion regarding calls that were performed against it.
O11y already has a tool serving that purpose, the llm proxy.
We should check if this could serve as a base for something more developer friendly, and more easily available from FTR test suites.