IGNF / myria3d

Myria3D: Aerial Lidar HD Semantic Segmentation with Deep Learning
https://ignf.github.io/myria3d/
BSD 3-Clause "New" or "Revised" License
151 stars 20 forks source link

How IoU values are computed? #108

Open GabrielePaolini opened 5 months ago

GabrielePaolini commented 5 months ago

Thank you for your efforts on this repo! I'm trying to train the toy dataset by following the documentation. I was able to train the network and also to infer on the same point cloud (by first removing the classification labels from the las file).

From the log of the training, it seems that the IoU value never gets close to 1. For example, for the building class, the highest IoU value is something around 0.30. This doesn't make sense to me, since a visual inspection of the inferred labels shows an almost perfect score for ground, building and vegetation classes!

So, how are IoU values computed for individual classes? How should I interpret these results?

Thank you in advance for your help!

CharlesGaydon commented 5 months ago

Hi @GabrielePaolini, thanks for using Myria3D.

I'm was able to replicate your observation, but I also gave a look to the training IoUs, which show overfitting as expected. See the screenshot below:

image

There is indeed a difference in how IoUs are calculated at eval time. Let me refer to you to the documentation on design choices. See the two last sections : Speed is of the essence and Evaluation is key to select the right approach. In a nutshell, we supervize learning on the subsampled cloud, for efficiency, which I guess could impact IoUs computed during evaluation.

I'm still surprised, since I would imagine that the impact would be rather small if the model got the exact same data during evaluation. Pytorch's doc says that the same dataloader is used for training and evaluation during overfitting. This may mean that the model might see a point cloud subsampled differently than the one seen during training, leading to degraded IoU during evaluation in an overfitting setting.So this observation could be simply due to pytorch's behavior and not to the computation of IoUs itself.

GabrielePaolini commented 5 months ago

Hi @CharlesGaydon and thank you so much for the explanation! I wasn't totally accurate in my explanation. In my question I was referring to the val/IoU values, which I assume are computed on the validation dataset (which for the toy dataset is the same as the training dataset).

In fact, I see that your val/IoU values also doesn't approach 1 (apart from the ground class). Why does this happen? Shouldn't the model overfit and give good IoU values on the validation step?

I really need to understand how should I evaluate the model performance, since I want to train RandLa-Net on new data. Should I rely on train/IoU values to assess the generalization capability of my model?

Thank you again for your support!

CharlesGaydon commented 5 months ago

Shouldn't the model overfit and give good IoU values on the validation step That is why I would have expected as well. My best guess is that the model is totally disturbed at evaluation time due to a different subsampling, and could have been robust to that if not for the different method of IoU computation at eval time.

This only happens when overfitting this data. So yes, you can totally rely on validation IoUs for your own traninings. I have never seen this outside of overfitting, and I even had the occasion to calculate IoUs out of Myria3D with a different code, with the same results. So I am fairly confident that this is an edge case that happens solely during overfitting on a single cloud.

Sorry if this is causing some confusion! I'm keeping this issue open until I have time to check out what causes this behavior during overfitting.

GabrielePaolini commented 5 months ago

Everything is clearer now, I hope it is an easily solved edge case! Anyway, thank you Charles, I am curious to know where the problem comes from.

CharlesGaydon commented 5 months ago

I gave this a quick look. The data transforms are run at each step. But the batch is of the same size and the average of each feature is constant across run, so it should be the same data that is seen by the model. The point clouds are in a different order, so I cannot say for sure if the point themselves are shuffled within a cloud. If they are, this could have a high impact on RandLA-Net since the model use decimation subsampling, which is sensitive to point order. But I think this is not the explanation since this would also affect training metrics from one step to another.

Then, I tried removing knn interpolation at evaluation time. This did not change anything (ouf!) so this is unrelated to the interpolation of points.

So my guess is that something weird happens due to the evaluation mode, maybe due to batch normalizations or dropout.

GabrielePaolini commented 5 months ago

Hi @CharlesGaydon, thanks for the update! I was going to clarify the situation: at the time I opened this issue, I trained the toy dataset using the default settings (RandLaNet-Overfit experiment). The fact is that I mistakenly inferred using the model checkpoint provided in the repo (proto151_V2.0_epoch_100_Myria3DV3.1.0.ckpt), so that's why I got the perfect results in contrast with the poor validation values.

To confirm your point, I downloaded the latest version of the code and run the default RandLaNet-Overfit and another experiment using the same settings from proto151_V2.0_epoch_100_Myria3DV3.1.0_predict_config_V3.7.0.yaml (with and without overfit). This time, validation values seem indicative of the goodness of the training (NOT 100% SURE). However, results are not good.

I attach here the config file, logs and result of the inference (displayed with CloudCompare) of the experiment run with the proto151 settings without overfit_batches. Inference was done using a checkpoint at epoch 71. The model doesn't seem to learn from the building and vegetation class and at some point the unclassified points starts increasing.

What is the reason why I get such bad results? How was the checkpoint proto151_V2.0_epoch_100_Myria3DV3.1.0.ckpt obtained?

config_tree.txt

Screenshot from 2024-02-07 17-56-23 Screenshot from 2024-02-07 17-56-28 Screenshot from 2024-02-07 17-56-32 train_loss_epoch train_iou_epoch train_iou_CLASS_vegetation train_iou_CLASS_ground train_iou_CLASS_building train_iou_CLASS_unclassified