Enhance EvalChain and Save harness.run() results

Current State

QAEval chain currently utilizes the Language Model (LLM) provided in the harness to evaluate responses generated by the LLM itself.
Additionally, the current implementation does not save the samples when executing harness.run(). This lack of sample saving limits the ability to review or analyze the generated samples without rerunning the model.

Suggested Solution

Model Selection Option : Provide users with an option to choose the model for evaluation. By default, load the GPT models for better evaluation, as some models may not perform well in instruction tuning, leading to suboptimal results.
Save harness.run() Results : Implement the functionality to save the results obtained from harness.run(). This enables users to directly import the results and make necessary changes in the evaluation without rerunning the model. This enhancement facilitates a more efficient and flexible evaluation process.

JohnSnowLabs / langtest