RGB overlay bug supposition

Rob174 commented 2 years ago

After printing the confusion matrix of a 100 images batches we obtain the following result:

But the network labels with the overlay all images as background (no annotation)

Supposition:

As we want all images from the same image, we use a different dataset than the datasetcache object. Therefore, we could have different preprocessing operations applied between these datasets

Rob174 commented 2 years ago

Using the same standardizer does not change the predictions (with former standardizer)

Rob174 commented 2 years ago

True:

Original dataset standardization + standardization cache 2021-07-19_16h18min55s__027481_0319CB_0EB7_it_15923_epoch_6_rgb_overlay_pred

Original dataset standardization only _2021-07-19_16h18min55s__027481_0319CB_0EB7_it_15923_epoch_6_rgb_overlay_pred

No standardization 2021-07-19_16h18min55s__027481_0319CB_0EB7_it_15923_epoch_6_rgb_overlay_pred

Cache standardization only 2021-07-19_16h18min55s__027481_0319CB_0EB7_it_15923_epoch_6_rgb_overlay_pred

Rob174 commented 2 years ago

Difference between patches

As augmentations has been applied on the original image we cannot get exactly the same patches in the cache and with the horizontal vertical grid

We extracted two patches of the same region with the horizontal grid and with the cache:

horizontal grid:

cache (without standardization)

standardization applied: mean = -4499.453874324647 std = 71710.10327965618 So -25 = 0.06239 ; -30 = 0.036232 ; 3.92 = 0.062799 ; -20 = 0.06246

After logging the image after standardization with cache we retrieve the same values as for the horizontal grid

Rob174 commented 2 years ago

However we observe a size difference: I have missed the resize step in the rgb overlay

-> changes nothing

Rob174 commented 2 years ago

Possibility: problem with model backups:

test: load the model and predict the categories of patches from cache

Rob174 commented 2 years ago

Problem with model backup:

test: for the first 1000 images of the cache test the prediction and count the number of times where something is found on a patch (seep or spill to 1)

Result: 0 images with annotation predicted

Nothing on the internet on this subject

complementary test: train during some epochs and then debug and test overlay

Labels : array([[2.68134638e-03, 2.35055317e-03], [5.77813148e-01, 5.28436542e-01], [6.63250208e-01, 9.96489748e-02], [7.48704071e-04, 1.87040528e-03], [4.83831584e-01, 5.22780120e-01], [5.47244608e-01, 4.01702404e-01], [3.40369460e-03, 1.54772541e-03], [7.27848351e-01, 1.81800306e-01], [4.30358768e-01, 5.90007126e-01], [4.43585739e-02, 6.34801537e-02], [6.34013951e-01, 2.43032813e-01], [2.94335932e-01, 7.35552907e-01], [8.78212079e-02, 2.27235369e-02], [3.87654126e-01, 5.72751939e-01], [2.44565696e-01, 7.23825037e-01], [2.64384076e-02, 6.78360276e-03], [7.06699252e-01, 4.29956347e-01], [6.25805795e-01, 4.57421660e-01], [3.10697034e-02, 3.76272458e-03], [7.22986639e-01, 1.49903834e-01], [7.25960970e-01, 4.62231457e-01], [7.35168019e-03, 1.90587970e-03], [9.67308730e-02, 9.29059803e-01], [2.19287083e-01, 8.76275837e-01], [1.96221471e-02, 1.40873305e-02], [4.07969624e-01, 5.85339129e-01], [5.53509116e-01, 8.19427446e-02], [9.04793292e-03, 1.10598765e-02], [4.35218483e-01, 7.44091570e-01], [7.50137806e-01, 3.47363800e-01], [3.09039862e-03, 4.41938359e-03], [1.11195840e-01, 4.08508807e-01], [5.94882786e-01, 3.81816715e-01], [5.54106459e-02, 7.81343877e-02], [3.87826502e-01, 5.52237451e-01], [5.97778916e-01, 5.23574233e-01], [1.88532588e-03, 9.66153853e-03], [7.17576891e-02, 8.42237592e-01], [1.65832222e-01, 7.25191414e-01], [2.18722108e-03, 1.68182538e-03], [3.55219573e-01, 6.00342393e-01], [2.25750431e-01, 6.55272543e-01], [4.69531817e-03, 1.69223652e-03], [5.55323303e-01, 6.69351876e-01], [7.00846016e-01, 3.95513922e-01], [9.67088807e-03, 5.73168090e-03], [4.04742688e-01, 6.47701025e-01], [8.67738605e-01, 2.49013364e-01], [2.25456525e-02, 1.32358242e-02], [1.12567723e-01, 6.99625373e-01], [8.74560654e-01, 1.09168090e-01], [1.29039660e-02, 2.55873110e-02], [3.74648124e-01, 7.45063782e-01], [7.74407268e-01, 3.17765385e-01], [7.13722827e-03, 3.38213658e-03], [5.31227171e-01, 4.62901980e-01], [1.57037094e-01, 8.20952594e-01], [2.13968635e-01, 3.96118639e-03], [9.21663582e-01, 2.17920884e-01], [2.62301654e-01, 4.63065177e-01], [1.24537814e-02, 2.47172010e-03], [4.64628428e-01, 6.34609520e-01], [9.35290873e-01, 1.39699042e-01], [3.83739686e-03, 3.11801443e-03], [5.28172374e-01, 4.88895267e-01], [8.02735388e-01, 1.98987216e-01], [7.28378305e-03, 4.20495868e-03], [8.68742585e-01, 1.56878516e-01], [5.84681451e-01, 4.97554868e-01], [3.35434735e-01, 2.56456167e-01], [6.69112086e-01, 3.42939019e-01], [8.10299397e-01, 1.41184241e-01], [2.01309323e-02, 4.08791238e-03], [7.82346010e-01, 2.14369446e-01], [8.73175800e-01, 1.69111162e-01], [7.23754847e-03, 2.76185852e-03], [3.63400459e-01, 6.39464259e-01], [6.79477751e-01, 3.97349447e-01], [1.92395132e-02, 5.17093390e-03], [5.39174676e-01, 4.21885818e-01], [6.71960354e-01, 3.91622722e-01], [2.99355900e-03, 1.40759500e-03], [6.73757553e-01, 4.47470903e-01], [9.16428387e-01, 9.66569707e-02], [1.79416519e-02, 3.74999270e-02], [2.45205685e-01, 8.06445301e-01], [3.77743244e-01, 6.01862907e-01], [1.48913115e-01, 1.66287825e-01], [5.91890216e-01, 5.96496582e-01], [5.38773477e-01, 5.15715480e-01], [3.44780227e-03, 1.98402139e-03], [6.70196295e-01, 3.49170983e-01], [8.88909817e-01, 2.08948404e-01], [1.52774472e-02, 2.86224056e-02], [3.13148379e-01, 8.16573739e-01], [7.86233604e-01, 3.01598847e-01], [2.38157785e-03, 4.53676470e-03], [1.01929106e-01, 8.74816179e-01], [6.14633977e-01, 3.03299367e-01], [1.95714924e-03, 6.67875586e-03]])

For image -2:

With all pixels around 0.0624

Rob174 commented 2 years ago

With restore previous model inside debugging session:

array([[2.68931244e-03, 4.13216604e-03], [5.22784412e-01, 5.94356954e-01], [6.78194463e-01, 9.72153246e-02], [5.56926418e-04, 2.63109826e-03], [4.75653380e-01, 4.57817286e-01], [6.38681293e-01, 2.97836006e-01], [4.01395746e-03, 1.33876712e-03], [6.50645673e-01, 1.82723373e-01], [2.93745309e-01, 7.34189630e-01], [8.11247453e-02, 3.34956460e-02], [7.00489938e-01, 2.56441921e-01], [4.38253939e-01, 5.72859943e-01], [9.21609849e-02, 2.13084910e-02], [3.07760477e-01, 6.79862678e-01], [3.37666482e-01, 6.16195381e-01], [3.90984640e-02, 6.42318325e-03], [7.48141468e-01, 3.01694572e-01], [7.81182587e-01, 2.81766385e-01], [1.54459225e-02, 3.59398150e-03], [7.38705635e-01, 1.72274023e-01], [6.22220278e-01, 5.23266971e-01], [8.50929506e-03, 1.68435147e-03], [1.50623173e-01, 8.86800826e-01], [2.77741551e-01, 8.49109232e-01], [5.87340333e-02, 7.55275711e-02], [4.29206669e-01, 4.74344611e-01], [4.54443097e-01, 2.57482469e-01], [5.70247229e-03, 2.44847350e-02], [3.00158441e-01, 8.26772153e-01], [7.98582733e-01, 2.56419510e-01], [4.37664613e-03, 9.37187113e-03], [1.08909652e-01, 5.11593938e-01], [6.37976885e-01, 3.32921714e-01], [9.31586251e-02, 8.52221623e-02], [4.56053615e-01, 4.29093689e-01], [6.90697730e-01, 4.32515919e-01], [1.65714265e-03, 1.10847875e-02], [7.15891048e-02, 8.21428776e-01], [1.69235840e-01, 7.37171412e-01], [1.54106528e-03, 1.48486451e-03], [4.80589122e-01, 4.24994946e-01], [3.48519802e-01, 5.51831365e-01], [3.64887388e-03, 1.88561203e-03], [5.73756158e-01, 6.45231664e-01], [6.93277538e-01, 4.10979360e-01], [6.78104302e-03, 1.00510167e-02], [3.26562643e-01, 6.62197053e-01], [8.39428902e-01, 2.54360288e-01], [1.78309176e-02, 2.91283280e-02], [1.49366990e-01, 7.48899162e-01], [8.83030534e-01, 1.12348765e-01], [9.73493699e-03, 6.53240532e-02], [3.25637490e-01, 7.69232869e-01], [7.89662659e-01, 3.23665351e-01], [6.36679120e-03, 6.08014408e-03], [6.59711003e-01, 3.66089046e-01], [1.39812499e-01, 8.62066686e-01], [1.14549696e-01, 5.80941280e-03], [9.10154104e-01, 1.84098765e-01], [3.07374001e-01, 4.70625609e-01], [1.29297385e-02, 2.47890456e-03], [3.72370452e-01, 6.02775812e-01], [9.53347206e-01, 1.00243501e-01], [4.35061147e-03, 3.04625602e-03], [4.97147888e-01, 5.38047612e-01], [8.08241308e-01, 1.45199791e-01], [4.15466540e-03, 4.46792599e-03], [8.12523186e-01, 1.73444659e-01], [5.20864725e-01, 4.95791465e-01], [2.48312354e-01, 4.36283588e-01], [5.60354710e-01, 4.70330834e-01], [7.69293308e-01, 1.28731683e-01], [1.23779131e-02, 5.85394166e-03], [7.67949820e-01, 2.23598853e-01], [8.79484057e-01, 1.49408013e-01], [4.29969607e-03, 4.11669351e-03], [3.92702013e-01, 6.69573665e-01], [5.51670730e-01, 5.55921733e-01], [6.19696546e-03, 7.42000248e-03], [7.09641814e-01, 2.29895413e-01], [7.53110588e-01, 2.28769228e-01], [2.60995864e-03, 9.20993451e-04], [6.10504031e-01, 4.35681224e-01], [9.14770246e-01, 1.00012161e-01], [1.09778736e-02, 3.83139253e-02], [3.94459695e-01, 6.52380347e-01], [5.33226788e-01, 2.73508966e-01], [2.96786696e-01, 2.29217231e-01], [6.84045553e-01, 4.99290645e-01], [3.94968361e-01, 6.29104137e-01], [1.58719032e-03, 2.65427469e-03], [7.01137364e-01, 2.06640571e-01], [7.80474246e-01, 3.73576045e-01], [1.15799820e-02, 4.33375873e-02], [5.13066173e-01, 5.72719812e-01], [7.82282114e-01, 2.67523974e-01], [1.87276688e-03, 6.34426717e-03], [1.58624604e-01, 7.86696851e-01], [7.13712633e-01, 2.02322349e-01], [1.16590085e-03, 1.20673012e-02]])

Rob174 commented 2 years ago

Not coming from the model: test overlay at the end of the training without restoring weights: same problem But performs poorly on this example debug_overlay2 shows that no image has higher probability for model pretrained restored (16 epochs)

Rob174 commented 2 years ago

script debug_overlay shows that: cache origin (full scale) with opencv warpAffine give similar result as cache filtered

Rob174 commented 2 years ago

Ideas of bug:

interpolation of resize: to check precisely but LANCZOS4 interpolation used also

Rob174 commented 2 years ago

The restauration of the model allows to obtain similar results as during training
We have tested to create patches with warpAffine (LANCZOS4) without any improvements (using PointAnnotations)

Rob174 commented 2 years ago

List of applied transformation in the cache

read from the cache filtered
standardization
stack into 3 channel image
stack into batch

cache filtered

create patch with augmentations (rotation step 10; mirrors ; resize of factor between 0.25 and 4)

Rob174 commented 2 years ago

Test to include non augmented images in cache filtered

Rob174 commented 2 years ago

Summary of operations applied on dataset filtered:

read raster from images_preprocessed.hdf5
Augmentations
- Mirrors
- Resize (random + initial to final size)
- Rotation
- Translation (to extract patch only)

Rob174 commented 2 years ago

TODO:

restore model ; use cache ; see probabilities -> must be ok
try to reproduce generator

Rob174 commented 2 years ago

Restoring model and testing it on classification cache : no value bigger than 0.5 in the 11600 first images Applying the same test (algo of debug_overlay3) after training gives similar result

Rob174 commented 2 years ago

List of problems:

problem at the level of model predictions on training data

OK:

overlay algorithm (give coherent overlay for true value)

Unknown:

Preprocessing algorithm: same method applied as for training data with test_cache

Rob174 commented 2 years ago

Test: Make 2 trainers one after another No problem

Rob174 commented 2 years ago

Test: Recompute last valid batch after end of loop no problem

Rob174 commented 2 years ago

Test: make loop on valid_dataset (without recomputing a new ClassificationCache object) No problem

Rob174 commented 2 years ago

Conclusion: problem at the initialization of the DatasetFactory

Only ClassificationCache created with inside of it

deterministic choice of keys for tr and valid
creation of LabelModifier1
creation of OtherClassPatchAdder
creation of StandardizerCacheMixed
creation of PatchAdderCallback

Rob174 commented 2 years ago

Preliminary test: Duplicating the trainer class, removing all training steps and metric save steps except matrix of confusion (for control puposes) After bug metric solved, no problem

Rob174 commented 2 years ago

Bug at the metric level (modifies the value predicted)

Rob174 commented 2 years ago

Main test: recreate in the main_script the datasetfactory and use Trainer Test Expected: all predictions smaller than 0.5 Result: Similar performances as during training...... ->> ❔

Rob174 commented 2 years ago

There must be a difference in the trainer between previous tests and this one

Rob174 commented 2 years ago

We will successively delete "inessential" parts of the training code

Rob174 commented 2 years ago

✔️ ok as expected ❌ not as expected

✔️ Removing progress bar -> no effect
✔️ confusion matrix and unnessary variable (unused ints counters) -> no effect
❌ use image directly with reshape : never predictions above 0.5 max at 0.33 on same number of samples

Rob174 commented 2 years ago

"Bug" with batch size

Supposition: batch norm and dropout layers are still activated during evaluation -> doc and forums seems to validate this supposition and suggest to add eval to desactivate these layers https://discuss.pytorch.org/t/test-accuracy-with-different-batch-sizes/22930/9

Not working: checking if in correct mode with training attribute during evaluation -> False: in evaluation mode ok as expected

But if we apply the same function to the previous training loop, we obtain similar results for the two trainings

Rob174 commented 2 years ago

Check data from cache

Check 1:

on the source images stats image per image to see if some of the images are out of the ordinary distribution

Function applied	Mean	Std
Original stat
Mean	-11.142515	2.0167942
Std	8.566244	1.3543913

On 488 images

Check 2

Does the group of patches of a same image have a different distribution than the original image ?

On annotated patches augmeted dataset

Problem -> augmentations: multiple augmentations of the same image Solution: All of the augmented versions of the same original image are generated one after another before going to the next original image. We can group patches with the same source image following each other to split successive augmentations of the same image

Function applied	Mean	Std
Original stat
Mean	-21.122908	2.8587813
Std	2.5189183	1.0826527

On 5467 source images (augmentation of 100)

On patches not annotated not augmeted dataset

Function applied	Mean	Std
Original stat
Mean	-15.767063	3.1433315
Std	4.6184974	1.4955554

On 137 source images

Rob174 commented 2 years ago

ErrorWithThreshold metric is not giving the same measure as the sum of the diagonal of the confusion matrix: ErrorWithThreshold metric count the number of times a predicted value is different than the true value. So for example with a batch size of 3:

true: [[1.,0.],[1.,0.],[1.,0.]] ; pred: [[0.,0.],[1.,0.],[1.,0.]] -> ErrorWithThreshold = 1. = Total(3) - Confusion matrix diagonal sum(2)
true: [[1.,0.],[1.,0.],[1.,0.]] ; pred: [[0.,1.],[1.,0.],[1.,0.]] -> ErrorWithThreshold = 2. != Total(3) - Confusion matrix diagonal sum(2)

Rob174 commented 2 years ago

Solved in commit d3ae0ea

Causes:

cache problem: regenerated
metric error(previousy named accuracy) threshold as explained above

Rob174 commented 2 years ago

Demo ok:

Confusion matrix

Prediction

True

Grid of 18*10 = 180 patches 37 of them have something on them predicted 2 are corrects -> 35 errors / 180 patches = 20%

Rob174 commented 2 years ago

Incoherent element left background misclassified

Rob174 commented 2 years ago

To be tested : check predictions in valid batch for earth

Rob174 / detection_nappe_hydrocarbures_IMT_cefrem