We're testing different LLMs for their performance on the drug dataset. For the evaluation framework, we need an LLMPipelineTest class that inherits from SingleResultPipelineTest.
It needs to be compatible with LLMPipeline (#60) and ExactMatchMetric.
Calling .run_pipeline() on some input data should return a list of single strings.
Calling .evaluate() should compare the output of an LLMPipeline with the desired input.
Could we have a commit for each task to make it clearer what's happening, please?
Is this the right issue type?
Summary
We're testing different LLMs for their performance on the drug dataset. For the evaluation framework, we need an
LLMPipelineTest
class that inherits fromSingleResultPipelineTest
.It needs to be compatible with
LLMPipeline
(#60) andExactMatchMetric
.Calling
.run_pipeline()
on some input data should return a list of single strings.Calling
.evaluate()
should compare the output of anLLMPipeline
with the desired input.Could we have a commit for each task to make it clearer what's happening, please?
Acceptance Criteria
Tasks
Confirm creation