utterances-bot commented 7 months ago

Don't Mock Machine Learning Models In Unit Tests

How unit testing machine learning code differs from typical software practices

https://eugeneyan.com/writing/unit-testing-ml/

dhruvnigam93 commented 7 months ago

I think we could draw a clean line between testing model training code and testing model inference code. Testing model training code is non-trivial and suffers from ML specific challenges. Testing model inference code once the model has been trained can borrow a lot from normal software unit-testing since once trained, most models behave deterministically.

afouchet commented 7 months ago

I disagree, especially with the example given in the article.

In my experience, most ML pipelines look like this data -(cleaning)-> cleaned data -(model)-> predictions -(postprocessing)-> cleaned output

By testing this pipeline with a mocked model, my predictions always have the same format.

In the example given

"Google’s T5 NLI model classifies factual consistency with class = 1 while Meta’s BART NLI model classifies it with class = 2!"

I'm very happy to force my model code to return "1" for "class factual consistency" If my model code is "use Meta BART and translate meta classes in my classes", it makes my pipeline agnostic to the model. It just gave a clear API to the model step

To test the actual model, I have :

my train / test code to find optimal one (research part)
(in some cases) expected results on particular inputs These tests are not unit tests

gabrielclimb commented 7 months ago

Why not use csv or parquet if the content of those files are small? I think keep the project more organised

smart-patrol commented 7 months ago

One pattern I’ve seen is the dummy model (empty model) with various marks. Shipping MLOps pipelines every step runs differently and marking them in the CI to run at stages or locally helps. The dummy model helps to run smoke tests before a release.

Eugene, you’ve scraped the surface here and could Dive Deep more.

eugeneyan commented 7 months ago

Why not use csv or parquet if the content of those files are small?

The intent is to keep the tests self-contained and reduce the need to reference other files for test, especially parquet files which can be opaque.

eugeneyan commented 7 months ago

One pattern I’ve seen is the dummy model (empty model) with various marks.

Could you say more about other patterns you've seen in unit testing ML code, or point me to relevant resources please?

Eugene, you’ve scraped the surface here and could Dive Deep more.

Heh was just sharing a discussion I had with a co-worker; this wasn't intended to be as detailed as my other write-ups.

DaniSanchezSantolaya commented 6 months ago

Very interesting post. Last year we open sourced mercury-robust with a similar purpose: https://github.com/BBVA/mercury-robust

We use it to perform tests on the data used to train models and the trained models. In this case, tests are splitted between DataTests (checking label leaking in features, noisy labels, etc) and Model Tests (like checking invariance, drift resistance).

Hopefully it will be useful!

annalyzin commented 6 months ago

Interesting ideas, Eugene! Crafting the "small, simple data samples" is perhaps the most critical yet challenging process. Refining these samples is usually an iterative process where new edge cases and errors to handle are discovered along the way. You provide a helpful checklist to get started.

eugeneyan / eugeneyan-comments

https://eugeneyan.com/writing/unit-testing-ml/ #79

Don't Mock Machine Learning Models In Unit Tests