16lemoing / dot

Dense Optical Tracking: Connecting the Dots
https://16lemoing.github.io/dot
MIT License
216 stars 11 forks source link

Reproducing the training results #12

Open wkbian opened 4 months ago

wkbian commented 4 months ago

Hi, @16lemoing,

Congratulations on your paper acceptance! :tada:

I encountered some problems while reproducing your training results. I followed the instructions in training section. Seems the motion loss was not convergent while I set world_size = 4 which aligns with the setting in the paper. "DOT is trained on frames at resolution 512×512 for 500k steps with the ADAM optimizer [32] and a learning rate of 10−4 using 4 NVIDIA V100 GPUs."

image

Could you please provide some suggestions? thx~

16lemoing commented 3 months ago

Hi @wkbian, it is normal that the training loss is a bit noisy. Can you run the evaluation on CVO to properly evaluate the performance of the final model? For example:

python test_cvo.py --split final --refiner_path checkpoints/YOUR_RUN/last.pth

I have found a bug in the code with the distributed training mode: all the GPUs were sampling the same elements of the dataset simultaneously. The issue is solved in https://github.com/16lemoing/dot/commit/cdee971fb0615fe3bf7b6fd19d856ea572327ec1 .

Also setting the flag --lambda_motion_loss 1000 when training improves a bit motion prediction quality but degrades a bit visibility prediction. This is what we use in our final method.