Material Library - Githubissues

UBC-MDS / fixml

Checklists and LLM prompts for efficient and effective test creation in data analysis

https://ubc-mds.github.io/fixml

Other

3 stars 2 forks source link

Material Library #6

Open tonyshumlh opened 5 months ago

tonyshumlh commented 5 months ago

This issues serves as the storage of all the related and useful material for the creation for Checklist and Prompt. Summary of material is recommended to be written down to save the effort of other readers

tonyshumlh commented 5 months ago

Microsoft Industry Solutions Engineering Team 2024 https://microsoft.github.io/code-with-engineering-playbook/machine-learning/

ml-fundamentals-checklist: A complete checklist. The Data Quality and Governance part could be useful ml-fundamentals-checklist.md
ml-testing: Provided idea and example what code should be tested. Mainly on Data ml-testing.md
ml-model-checklist.md: Checklist about ML model in Production. NOT necessarily be useful ml-model-checklist.md

tonyshumlh commented 5 months ago

Jeremy Jordan - Effective testing for machine learning systems https://www.jeremyjordan.me/testing-ml/ Group-7

Proposed a workflow to include tests (mainly ML pipeline tests) into ML development
Introduced the ideas of Pre-train tests and Post-train tests:
Pre-train tests are conducted before the model is trained, aiming to identify bugs early on and potentially save time by avoiding wasted training jobs.
Post-train tests utilize the trained model artifact to inspect behaviors for various scenarios defined by the testing process. These tests aim to understand the logic learned during training and provide a behavioral report of model performance.
- Invariance Tests: Assess whether deliberate change to the input affect the model's output.
- Directional Expectation Tests: Define deliberate change to the input with predictable effects on the model output.

tonyshumlh commented 5 months ago

Studying the Practices of Testing Machine Learning Software in the Wild https://arxiv.org/pdf/2312.12604

Research on the Practices of 10 Testing Machine Learning Benchmark Projects. Test examples are included in the paper

Testing Strategies: Four major categories were identified: Grey-box, White-box, Black-box, and Heuristic-based techniques. Grey-box and White-box techniques were the most commonly used.

ML Properties Tested: 16 ML properties were identified, with functional correctness, consistency, robustness, data validity, and efficiency being the most frequently tested.

Testing Methods: Thirteen different testing methods were identified, with only seven previously included in the Test Pyramid of ML.

tonyshumlh commented 4 months ago

Retrieval-Augmented Generation

Technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources, e.g. developer defined database.
It could be our ML test checklist and other background information.

JohnShiuMK commented 4 months ago

https://arxiv.org/pdf/2310.01402

Evaluating the Decency and Consistency of Data Validation Tests Generated by LLMs∗ An application to Canadian political donations data

By a professor from the University of Toronto