add unit tests - Githubissues

ascourtas commented 3 years ago

Add basic unit tests to ensure training and labeling works correctly. I'd recommend using pytest. Methods to prioritize (in the torchvision-rewrite branch):

[ ] train_epoch() (in train.py)
[ ] evaluate() (in evaluate.py)
[ ] train_model() (in train.py -- this one will be a little harder to test, I'd recommend using a subset of the data for unit testing purposes, and then check that the loss is within a reasonable margin of error after x epochs. Can use a small number of epochs (<20) to start out)
[ ] all transforms in transforms.py (this will take some time to get through -- do the other tasks first)
[ ] any postprocessing code and scripts
[ ] add test for ensuring weights are correct (see below comments from Marcus)

ascourtas commented 2 years ago

Also recommend GitHub Actions for CI

ascourtas commented 2 years ago

From Marcus on ensuring weights are correct:

Sure! So what I've done in the past is have a test that loads a .pth or .pkl containing weights, and maps this onto the corresponding architecture. That I way I know that (1) the weights are in the repo and I can load them, (2) the scripted architecture layers matches the weights I've loaded and I'm not missing any layer weights (because of architecture mismatch, for eg). Once I've loaded the model in, I run inference on either an existing test image or just on np.zeros([X,Y,1]) . If my architecture can handle different input dimensions, run inference at least twice with two ndarrays of different dimensions. In the tests directory, I'll put a .json or .yml containing some results that I've pre-computed, eg. I've already run inference on np.zeros([X,Y,1]) and saved the output to test_inference.json. At test time, I load in the corresponding json, locate the relevant ndarray (call it true_inference). Then I'd write something like assert np.isclose(model.predict(np.zeros([X,Y,1])), true_inference, tol=1e-5). This tests that (3) my weights haven't changed for some weird reason, and weights are reliably mapping bijectively to their correct locations each time, and (4) I can in fact run inference on dummy data, and aligns with what I expect. Does that all make sense? I'm not claiming this is the "de facto" way to do it, but it's how Eric makes me do it all the time, and I (mostly) trust that he knows what he's doing. :sweat_smile:

Oh, just realized that for torch models, you'd have to do a squeeze when running inference on a single input (edited)

differentiate-catalysis / catalyst-bubble-detection

add unit tests #12