UBC-MDS / fixml

LLM Tool for effective test evaluation of ML projects with curated Checklists and LLM prompts
https://ubc-mds.github.io/fixml
Other
3 stars 2 forks source link

Material Library #6

Open tonyshumlh opened 7 months ago

tonyshumlh commented 7 months ago

This issues serves as the storage of all the related and useful material for the creation for Checklist and Prompt. Summary of material is recommended to be written down to save the effort of other readers

tonyshumlh commented 7 months ago

Microsoft Industry Solutions Engineering Team 2024 https://microsoft.github.io/code-with-engineering-playbook/machine-learning/

tonyshumlh commented 7 months ago

Jeremy Jordan - Effective testing for machine learning systems https://www.jeremyjordan.me/testing-ml/ Group-7

tonyshumlh commented 7 months ago

Studying the Practices of Testing Machine Learning Software in the Wild https://arxiv.org/pdf/2312.12604

  1. Testing Strategies: Four major categories were identified: Grey-box, White-box, Black-box, and Heuristic-based techniques. Grey-box and White-box techniques were the most commonly used.
Screenshot 2024-04-30 at 9 52 03 AM
  1. ML Properties Tested: 16 ML properties were identified, with functional correctness, consistency, robustness, data validity, and efficiency being the most frequently tested.
Screenshot 2024-04-30 at 9 52 14 AM
  1. Testing Methods: Thirteen different testing methods were identified, with only seven previously included in the Test Pyramid of ML.
Screenshot 2024-04-30 at 9 52 22 AM
tonyshumlh commented 6 months ago

Retrieval-Augmented Generation

image

JohnShiuMK commented 6 months ago

https://arxiv.org/pdf/2310.01402

Evaluating the Decency and Consistency of Data Validation Tests Generated by LLMs∗ An application to Canadian political donations data

By a professor from the University of Toronto