Question about the result of online adaptation with "L2AWad"

Tiam2Y commented 2 years ago

Hello! Thanks for the great work! @AlessioTonioni I'm at it again and have questions about the results of "Learning to adapt".

I used 12 Synthia video sequences as dataset and meta-trained the network you provided with the following parameters:

--dataset=./meta_datasets.csv
--batchSize=4
--weights=./pretrained_weight/weights.ckpt  # download from the link you provided
--numStep=40000
--lr=0.0001
--alpha=0.00001
--adaptationSteps=3
--metaAlgorithm=L2AWad
--unSupervisedMeta
--maskedGT

After training, I used this weight to test online adaptation on video sequences from DrivingStereo and KITTI raw data. I found that the prediction results for the first few frames were extremely poor , the error rate D1 is close to 99%, but after 100 to 200 frames, D1 quickly drops below 10%.

I would like to ask:

Is it normal for the initial prediction results to be so poor?
Is there anything wrong with my training?
Is this result representative of your work for comparison?

Sorry for the troublesome questions, but I'd appreciate your answers!

Tiam2Y commented 2 years ago

And the image shape of Synthia is scaled to half resolution: [380, 640]

AlessioTonioni commented 2 years ago

HI Tiam, sorry for the late reply. I honestly cannot remember exactly how the models were trained or performing, but a 99% D1 error seems quite wrong. Do you get better result with one of the other meta training strategies? Considering that you rescale the image are you also rescaling disparities accordingly? I.e. if you scale the image to half resolution the GT disparity will need to be rescaled and divided by 2/

Tiam2Y commented 2 years ago

Hi, thank you very much for your reply. Since I only want to test the effect of confidence weight, I only use this meta-training strategie: "L2AWad", and I have not tried other meta-training strategies. The reason I scaled the image to half is following the description in your paper, i.e.For this dataset we scaled the image to half resolution to bring the disparity into the same range as KITTI.So the disparity is scaled accordingly and using the code you provided in the class metaDataset

In addition, a problem found during training is that the loss is very likely to suddenly become larger, as large as several thousand. Whereas the results I described earlier are in the case of normal training, i.e. the loss starts around 30 and drops to around 3 and converges.

Tiam2Y commented 2 years ago

Hello @AlessioTonioni ， I thought about it for a while and thought this phenomenon might be normal. I randomly read 100 disparity ground-truth maps from the datasets synthia, KITTI, and DrivingStereo, calculated the maximum disparity among them, and took the average value. Finally, the average maximum disparity of the three datasets was: 236, 85, around 76. So, even if synthia's image and disparity are scaled by half during training, the disparity search range is much larger than the test data(KITTI raw data and DrivingStereo), which makes the network's prediction results for the first few frames poor when it is tested online after training.

AlessioTonioni commented 2 years ago

Hi Tiam, I honestly don't remember anymore the range of disparities used and if it's fitting with the one you measured. In general the range should more or less match between train and test data to get good results. Besides the maximum values you can also try to compute an histogram of the distribution of disparities and compare them.

peiran88 commented 1 year ago

@Tiam2Y I met the same problem here, the loss is very likely to suddenly become larger, finally, become NAN. How do you deal with this problem?

Tiam271 commented 1 year ago

Hello @peiran88 . I don't have the problem you describe. The problem I encountered is: during online learning, the predicted disparity of the first few hundred frames deviates greatly from the ground-truth, and the loss will gradually decrease after a long time.

peiran88 commented 1 year ago

@Tiam2Y Thank you for telling me that. I faced the same problem that the initial error during testing is quite large. Did you offline train the model again with a different configuration after that? Now I am pretty confused about achieving similar accuracy as the paper.

Tiam2Y commented 1 year ago

@peiran88 In my impression, I modified the configuration and retried several times, but it was the same result. You can refer to my comments above. I think the main reason is that the disparity range of the data during offline training is quite different from that during online testing. So I think you can change to a dataset with a close disparity range for offline training (for example, you can use Carla simulator to render a new dataset as mentioned in the paper)

CVLAB-Unibo / Learning2AdaptForStereo

Question about the result of online adaptation with "L2AWad" #9