eye-on-surveillance / sawt

https://sawt.eyeonsurveillance.org/
MIT License
16 stars 10 forks source link

Creating DeepEval Validation Pipeline for getanswer #240

Closed outlawhayden closed 6 months ago

outlawhayden commented 6 months ago

Hey all -

Wanted to share our progress in creating the evaluation pipeline. There's some files in the eval folder now, but there were some changes to code in the getanswer functions, as well as two new function files test_model_live and test_model_cached i believe. test_model_live lets you evaluate against metrics from a query in the cli, and test_model_caced reads in a list of test cases from a csv.

Using the deepeval library - you might also have to install pytest, and potentially just hardcode the OPENAI_API_KEY into an environment variable. Happy to help with getting it up and running if you have any questions!

vercel[bot] commented 6 months ago

@outlawhayden is attempting to deploy a commit to the Eye on Surveillance Team Team on Vercel.

A member of the Team first needs to authorize it.

outlawhayden commented 6 months ago

There's a ton of other commits on here that are old - can keep or not keep if you would like. Theoretically they don't present any problems, was able to pull your progress and merge on top of ours last week no problem. Deepeval stuff really starts at https://github.com/eye-on-surveillance/sawt/pull/240/commits/3afa3263f69e763b69c340d960f27841005a5b34

aronwc commented 6 months ago

@outlawhayden Thanks for this. Let's prune this PR down so it is just the eval stuff. Ideally a PR should be a bite-sized code change. We can chat tomorrow.

outlawhayden commented 6 months ago

@outlawhayden Thanks for this. Let's prune this PR down so it is just the eval stuff. Ideally a PR should be a bite-sized code change. We can chat tomorrow.

I agree. I could have sworn I did it just starting at https://github.com/eye-on-surveillance/sawt/commit/3afa3263f69e763b69c340d960f27841005a5b34 - so I'm not 100% sure why the old things are in there that I thought we ironed out. Regardless after our conversation I can get something smoother together - I will close it now and reopen tomorrow afternoon.