Open CAW-nz opened 5 days ago
Loading of source files when adding goldens to dataset is now supported in PR: https://github.com/confident-ai/deepeval/pull/1178/files. In terms of test cases, a work around is to store the source_file within the additional metadata field. since you see that most of the fields in LLMTestCase are parameters used for evaluation. Can I ask what you're trying to use the source file for when creating test cases?
@kritinv Thanks for making the changes for the two add goldens from file methods, but one part isn't quite right yet.
source_file=file_path,
instead of source_file=source_file,
(which was updated for add_goldens_from_csv_file method).By the way - although this really is more appropriate for Issue #1171, it's nice that you've customized the encoding string for the open statements, so that "utf-8" is the default but could be set to something else if necessary. Note that the 2 open statements in dataset.save_as method still have no "utf-8" default - but that is the topic for Issue #1171 to resolve.
Regarding your question about source file when creating test cases. I don't have any particular use case I'm specifically trying to address. I was really working through a few examples to get familiar with the capability of DeepEval and this bit seemed a bit strange to hold the data as a golden but not allow it to be retained as a test case. I'm thinking of testing a RAG and generating test cases from my source files. Obviously I generate the golden using DeepEval, but then I have to create the Test Case with my LLM's answer and then want to save the TC in a 'ready to be evaluated' format. (Saving allows the TC to be modified if necessary - such as adding supplementary questions from a SME (+its related 'answer'), and more importantly reloaded without having to go thru the full generate cycle each time.) It's odd that I can't include the source file of the question in my completed TC info that I'm then going to load and evaluate with DeepEval metric analysis. It's not an issue when you only have one source file (as I currently have) - but I can foresee an issue if I had lots of source files (at least when wanting to directly look back at the true source context info for any given question/answer pair).
Yes you are right it could be put into the metadata (as long as that support custom structure - which I guess it does), but I wanted to use the standard save_as routines. Neither synthesizer.save_as or dataset.save_as methods save metadata (and they both only support saving goldens but I see that saving test cases is a TODO). But (for what I described above) rather than use metadata, I'd simply change my populate LLM answer routine to populate the golden, and then save it directly. Now that you've added support for loading source_file data we can cycle between save and load without loss - as long as it stays in 'goldens' structure. But my workaound still wouldn't solve the issue of loss of source_file if I'm wanting to save test_case info (such as metric evaluation results) after doing evaluate calls.
I also want to report an issue regarding consistent propagation of source_files data through the file save and load functions.
synthesizer.generate_goldens_from_docs
andgenerate_goldens_from_contexts
capturessource_files
info in the data structure, and this is saved as part ofsynthesizer.save_as
anddataset.save_as
but unfortunately the relevant load routines don't consume this data and bring them back during theEvaluationDataset.add_test_cases_from_json_file
andadd_test_cases_from_csv_file
methods. There is a similar issue forEvaluationDataset.add_goldens_from_json_file
andadd_goldens_from_csv_file
methods, although in these versions thefile_path
is put into thesource_files
storage (which is fine if the original file didn't have the data but otherwise I believe it should take the data from thesource_files
info saved in the actual file). These two methods could be enhanced to do this context based setting for this data.In summary: These "add_testcases_from_xxx_file" methods currently do not load this data even if it exists in the source files (from when the data was created as goldens). These "add_goldens_from_xxx_file" methods use the file_path data instead (even when the underlying files have specific source_files info). I guess source_files was a recent addition and while the save parts got added the load parts got missed??
Related issue re test_case/llm_test_case.py for class LLMTestCase: LLMTestCase doesn't support source_file so I had to comment out the line reference I'd added in below code even through the data is supported as a golden and could easily be part of test_cases structure too (to preserve the source info).
FYI - The 2 supporting routines in dataset/utils.py - convert_test_cases_to_goldens() and convert_goldens_to_test_cases() would need to be enhanced to support
source_file
copying between these two structures as well.