Stack error occurred during training

ais-lab / d2s

[RAL 2024] D2S: Representing sparse descriptors and 3D coordinates for camera relocalization

https://thpjp.github.io/d2s/

Apache License 2.0

65 stars 5 forks source link

Stack error occurred during training #12

Closed lzyh98 closed 3 months ago

lzyh98 commented 3 months ago

Hello, may I ask why there is a RuntimeError: stack expectations each tense to be equal size, but got [256, 540] at entry 0 and [256, 868] at entry 1 during my training. I tried and found an error in for batch_idx, (data, target) in enumerate (self. train_loader):, and the type and length of self. train_loader are<class' torch. utils. data. dataloader DataLoader and 500. May I ask how to solve the problem of image size stitching.

thuanaislab commented 3 months ago

Hi thank you for your interest, the error was raised because the number of descriptors per image differs. So you can not run the training with batch size > 1. You can change the batch size to 1. It shouldn't occur this error anymore. Otherwise, you have to create the dataset in which all images have the same number of keypoints descriptors=2048 (change this when you run Hloc and config the superpoint to force 2048 keypoints)

lzyh98 commented 3 months ago

Hello, thank you very much for your reply. I changed the batch size to 1 and found that it can indeed be trained. I also understand what you mean. But there is one more question. According to your instructions, I changed the superpoint configuration in Hloc from 4096 to 2048 keypoints during data preprocessing, but still encountered the problem mentioned earlier. I would like to know the reason. Another issue is that when I tried to evaluate and test using your pre-trained model, I obtained a result where the proportion under all thresholds was 0. I looked at the code but couldn't figure out where the problem was. I would like to know if you have any suggestions.

thuanaislab commented 3 months ago

I think you forgot to set an additional superpoint config. You must set'keypoint_threshold': 0.0 instead of 'keypoint_threshold': 0.005 by the default of superpoint, then the error should not be occurred Recently I only train with the batch size=1, the result also shows very good and it simplifes the configuration of superpoint.