NCATSTranslator / TranslatorTesting

0 stars 0 forks source link

Metadata needed for level of semantic validation for QA pairs #15

Open jh111 opened 6 months ago

jh111 commented 6 months ago

Kara is adding helping scale the number of SME-free test assets.

We need is one (or more) columns in testingAssetsfor50Github_2024-01-02_20240102evaluation.xlsx to represent the level of validation, and guidelines for values that go in that column.

Currently, most QA pairs hav validation SME or SmuRF. New QA pairs may have _no validation yet_ (but are still useful for delta), or "came from Chat GPT".

In a related note, we have to decide if this xls is going to remain the one place for all QA pairs, or if we want multiple spreadsheets and/or a database.

karafecho commented 6 months ago

Here is a link to the G-sheet. (I think the one above may be broken.)

I added Column J, "Method of Generation", and included a dropdown menu with seletions of: Manual, SMuRF; Manual, SME; Automated, LLM; Other (please specify). I also added Column K, "Level of Validation", and included a dropdown menu with selections of: SMuRF; SME; No validation; Other/multiple (please specify). There may be a better way of capturing the desired information, so please feel free to delete the columns and suggest another approach.

Question for @jh111 : I was viewing ChatGPT as a method for (quickly) generating assets, but not necessarily validating them. I think that's what you mean, right? If not, then the approach I suggested will not work.

jh111 commented 6 months ago

Your two column approach works well and I agree with ChatGPT generated, No Validation.

I supposed in the future we could have other types of validation. We could have ChatGPT vX confirmed, but I'm not sure how heavily I'd weigh that validation...