facebookresearch / OrienterNet

Source Code for Paper "OrienterNet Visual Localization in 2D Public Maps with Neural Matching"
Other
447 stars 44 forks source link

Loss=-inf problem when fine-tuning on KITTI #12

Closed BPsoda closed 1 year ago

BPsoda commented 1 year ago

Hello, When I fine-tune my model on KITTI following the instruction, I encountered loss=-inf problem. The fine-tuning command is

python -m maploc.train experiment.name=OrienterNet_MGL_kitti data=kitti \
    training.finetune_from_checkpoint='"experiments/OrienterNet_MGL_reproduce/checkpoint-step=340000.ckpt"'

I believe this is a issue related to yaw_prior mask. When running on KITTI, a yaw prior is created around the ground truth yaw angle. At localization, any angle masked out by the yaw prior are set to log_prob=-inf. However, the range of prior is too narrow. So at grid sampling, interpolation might use value outside the prior mask, resulting in loss=-inf.

I disabled the yaw prior by comment this line https://github.com/facebookresearch/OrienterNet/blob/213aff45ce49a6aea11d273d198d9c2969457e10/maploc/data/kitti/dataset.py#L49 After that, the losses are back to normal. To fix this issue, I suggest setting piror_range_rotation larger than max_init_error_rotation or disable the yaw prior mask at finetuning, because I don't see the point of using it at finetuning. Best regards

ToABetterDay commented 1 year ago

Hi, may I ask where you download the checkpoint "checkpoint-step=340000.ckpt", I failed to download it when trying their guideline

BPsoda commented 1 year ago

Hi, may I ask where you download the checkpoint "checkpoint-step=340000.ckpt", I failed to download it when trying their guideline

Hi, the pre-trained model should be downloaded automatically when you run the evaluation scripts. If you failed to download it with scripts, you can down download it from here, and manually place it under the experiments folder. You may open a new issue if you have other questions.

yicocc commented 1 year ago

Hi, may I ask where you download the checkpoint "checkpoint-step=340000.ckpt", I failed to download it when trying their guideline

Hi, the pre-trained model should be downloaded automatically when you run the evaluation scripts. If you failed to download it with scripts, you can down download it from here, and manually place it under the experiments folder. You may open a new issue if you have other questions.

I want to ask for you this question image

BPsoda commented 1 year ago

Hi, may I ask where you download the checkpoint "checkpoint-step=340000.ckpt", I failed to download it when trying their guideline

Hi, the pre-trained model should be downloaded automatically when you run the evaluation scripts. If you failed to download it with scripts, you can down download it from here, and manually place it under the experiments folder. You may open a new issue if you have other questions.

I want to ask for you this question image

You can refer to this issue #13

yicocc commented 1 year ago

Hi, may I ask where you download the checkpoint "checkpoint-step=340000.ckpt", I failed to download it when trying their guideline

Hi, the pre-trained model should be downloaded automatically when you run the evaluation scripts. If you failed to download it with scripts, you can down download it from here, and manually place it under the experiments folder. You may open a new issue if you have other questions.

I want to ask for you this question image

You can refer to this issue #13

Excuse me,I want to ask for you some question again,when I modify my code ,it shows as this figure ,it isn't norml? image

xiilei99 commented 5 months ago

Hi. During the process of fine-tuning the model using Kitti, the loss_train kept decreasing, while the loss_val kept increasing from the beginning, indicating an overfitting problem. Have you encountered the same situation? I trained with 4 gpus, with a batch size of 32.

IoflyTang commented 4 months ago

Hi. During the process of fine-tuning the model using Kitti, the loss_train kept decreasing, while the loss_val kept increasing from the beginning, indicating an overfitting problem. Have you encountered the same situation? I trained with 4 gpus, with a batch size of 32.

I had the same problem. Did you find a solution?