Performance - Githubissues

shahabty commented 6 years ago

Hello, Can you upload pretrained weights? Because I trained it for 72 epochs. But the testing loss is too high and the results are not visually convincing at all. Thanks

Yusufma03 commented 6 years ago

@shahabty Hi, I'm having the same problem. After 100 epochs, the network almost learned nothing. Have you found a way to solve it? Thanks.

ChiWeiHsiao commented 6 years ago

Hello,

Sorry I don’t have access to my computer now, I will upload the weight on 19th :)

Yusufma03 commented 6 years ago

@ChiWeiHsiao Thanks for your reply! Looking forward to it.

Btw, could you kindly share some tricks for the training? I also want to reproduce the results, but my network almost learned nothing. Actually, I'm quite confused how long the training sequence should be and whether we need to first train the network with a short sequence then increase the length and finetune the weights. I doubt the LSTM trained with a limited length can perform well on a longer data sequence.

Thanks a lot!

shahabty commented 6 years ago

@Yusufma03 I still have the problem with performance. I think pretrained weights that @ChiWeiHsiao will upload are helpful. Also, I appreciate it if @ChiWeiHsiao explains how the network is trained (especially data pre-processing, batch size, trajectory length, learning rate policy, which pretrained flownet must be loaded used).

Yusufma03 commented 6 years ago

@shahabty Hi, I just managed to train the network to make some at least "reasonable" predictions by adjusting the ratio between the angle loss and translation loss. I increased the weight for the angle loss according to the epochs. The other thing is to train it for more epochs and use a large image size. However, the overfitting is still a problem. The performance of the trained network on the validation set is still not good enough. It would be nice if @ChiWeiHsiao could give me some suggestions on how to handle this problem. Thanks a lot!

ChiWeiHsiao commented 6 years ago

Hi @shahabty @Yusufma03, The trained weights can be downloaded here. This is trained for about 400 epochs, but it also suffer from overfitting. :( The author of DeepVO paper seems to have good result, while the values of some hyperparameters are not stated in paper. BTW, one difference between mine and author's setting is the size of input image, I shrink the size to 1/4 due to memory limitation, not sure if this has great impact on performance.

shahabty commented 6 years ago

Hello, I think the main problem is pre processing which must be done correctly to prevent overfitting. The number of images in KITTI is not enough for training. So they might have done some transformation on inputs to generate more input data. Also, another problem might be the length of trajectory which might not be 10.

yp233 commented 5 years ago

Hi~ o(￣▽￣)ブ @ChiWeiHsiao Sorry to bother you! Recently I trained your net about 250 epochs.The loss value reduced to about 0.67.However the results i got were so terrible.So i'd like to load your trained weights which was trained about 400 epochs . But i came across the problem when i loaded it. It's said that the different pytorch version made it . But i also trianed with torch version 0.4.0. The problem confused me a lot! 😭 Are your results totally got from codes in this repo without any modification? If so, I won't train them .😂 The following pic was one of my resullts. route_02_gradient The following pic was the problem when i loaded your trained weights . Can you give me some suggestions? QAQ

yokoyan96 commented 5 years ago

@yp233 I met the same problem,and I have solved it. You just need to modify that sentence as following: M_deepvo.load_state_dict(torch.load(par.load_model_path), strict=False)

yuanjinsheng commented 5 years ago

hi,i can't downloads the the KITTI dataset when run the downloader.sh, it always stop. can you tell me some other way for downloading?

ChiWeiHsiao commented 5 years ago

Hi @yuanjinsheng, The files are large, maybe the remaining space of your computer is not enough? Could you post your error message?

shahabty commented 5 years ago

Hi, Please go to kitti website and register for the dataset. They will send the download link to you. There are different benchmarks of kitti. You probably only need odometry and raw kitti benchmarks.

yuanjinsheng commented 5 years ago

hi @ChiWeiHsiao, the space is enough, it just stop after download a little , do you has other link?

yuanjinsheng commented 5 years ago

@shahabty i cann't find the same dateset as the downloader.sh, how you do it ?

ChiWeiHsiao commented 5 years ago

@yuanjinsheng You can download all image datas and ground truth pose here, email register is needed. And the urls in script are from here, choose [synced+rectified data].

yuanjinsheng commented 5 years ago

@ChiWeiHsiao thank you for your help.

sunnyHelen commented 5 years ago

Hello, guys. There's a parameter 'seq_len' in param.py. I want to know the what's the meaning of 'seq_len' . Why it's (5,7)? Should it be a single number, if it's the sequence length? @Yusufma03 @shahabty Thank you very much~

Yusufma03 commented 5 years ago

@sunnyHelen Hi, that's the sequence length used in training the LSTM, which is a hyperparameter you should tune. The author of DeepVO proposed to randomly segment the videos into subsequences of different lengths and said this is useful for reducing avoiding the overfitting. (5, 7) means the min length of the subsequence is 5 and the max is 7. However, the author didn't mention what length he used in the paper.

sunnyHelen commented 5 years ago

Ok. Thank you very much.

sunnyHelen commented 5 years ago

sorry to bother you~ I found that the file "data_helper.py" and "helper.py" were updated and changed. And in my case the file after changed can't work now and they could work before. 2640a682b6ba07aa2d0f7a7cef98e0e

File "/project/RDS-FEI-HMZ-RW/Original_DeepVO-pytorch-master_lstm/data_helper.py", line 204, in getitem groundtruth_rotation = raw_groundtruth[1][0].reshape((3, 3)).T # opposite rotation of the first frame ValueError: cannot reshape array of size 0 into shape (3,3)

I want to ask what's the problem. Have you ever encounter this thing?@ @ChiWeiHsiao @Yusufma03 @shahabty

alexart13 commented 5 years ago

@sunnyHelen try to rerun preprocess.py to update the files in pose_GT folder. The changes you mentioned affected the structure of the files.

sunnyHelen commented 5 years ago

Oh, right. Thank you for your help~

sunnyHelen commented 5 years ago

Sorry to bother you again~ I run train.py using pretrained flownet "pretrained/flownets_bn_EPE2.459.pth.tar". I got the training loss like this. The training loss and valid loss is so small. I want to ask if it is normal?

alexart13 commented 5 years ago

@sunnyHelen It is normal.

kourong commented 5 years ago

sorry to bother you~ I found that the file "data_helper.py" and "helper.py" were updated and changed. And in my case the file after changed can't work now and they could work before.

File "/project/RDS-FEI-HMZ-RW/Original_DeepVO-pytorch-master_lstm/data_helper.py", line 204, in getitem groundtruth_rotation = raw_groundtruth[1][0].reshape((3, 3)).T # opposite rotation of the first frame ValueError: cannot reshape array of size 0 into shape (3,3)

I want to ask what's the problem. Have you ever encounter this thing?@ @ChiWeiHsiao @Yusufma03 @shahabty

when I run main.py occure above question

@sunnyHelen try to rerun preprocess.py to update the files in pose_GT folder. The changes you mentioned affected the structure of the files.

I try to rerun preprocess.py ,but this is not work ,can you help me

alexart13 commented 5 years ago

@kourong try to clean up 'datainfo' folder and run main.py again.

kourong commented 5 years ago

@kourong try to clean up 'datainfo' folder and run main.py again. thank you ,I try to it

sunnyHelen commented 5 years ago

Hi, sorry to bother you~ Is the way you use to calculate mse_rotate and mse_translate the same as the result in the deepVO paper? why I got much worse result compared with those in the paper?

alexart13 commented 5 years ago

@sunnyHelen It would be incorrect to directly compare mse_rotate and mse_translate with the metrics in the article because they are calculated differently. The authors of the article operated with RMSE translation in % relative to the distance and RMSE rotation in degrees per 100m.

sunnyHelen commented 5 years ago

Oh. I got it. Thanks a lot~

yxh1993 commented 5 years ago

Hi~ o(￣▽￣)ブ @ChiWeiHsiao Sorry to bother you! Recently I trained your net about 250 epochs.The loss value reduced to about 0.67.However the results i got were so terrible.So i'd like to load your trained weights which was trained about 400 epochs . But i came across the problem when i loaded it. It's said that the different pytorch version made it . But i also trianed with torch version 0.4.0. The problem confused me a lot! Are your results totally got from codes in this repo without any modification? If so, I won't train them . The following pic was one of my resullts. The following pic was the problem when i loaded your trained weights . Can you give me some suggestions? QAQ

how do you make the picture?

akshay-iyer commented 5 years ago

Hello, Can anyone tell me where can we get the path lengths and the speeds of the various sequences in the KITTI color dataset used? The authors have presented their analysis based on it but I could not find the lengths and speeds of the trajectories. I would want to verify their claims on my results as well.

alexart13 commented 5 years ago

@akshay-iyer The dataset contains groundtruth data. Thus you can calculate the length by summing up the distance between each two consecutive points.

akshay-iyer commented 5 years ago

Many thanks @alexart13 I have a few more questions. Apologies if they are naive, I'm just trying to learn and grow in deep learning.

There are two trained models for the optimizer and the weight uploaded on the repo, however, if I directly run the test.py using them the results are very bad. Hence I'm running main.py to train them and the results improve over a few epochs. Is it supposed to be trained or ran directly?
My college cluster allows only 12 hours of running time and only 12-13 epochs run in that time. I see that the code saves the model if loss decreases. So after my session expires and I run main.py again, will it directly run from the previous state? Since we are printing the epochs from 1 in main.py, it does not show that it runs from last saved epoch.
If the answer to Q2 is that it runs from the last saved epoch, how do we access the epoch number and print the for loop from that number say start the second run from epoch 14?

Thanks, Akshay

alexart13 commented 5 years ago

@akshay-iyer

You should be able to get the same results as mentioned in the root page of this repo using the trained model https://drive.google.com/file/d/1l0s3rYWgN8bL0Fyofee8IhN-0knxJF22/view
It depends on the settings in params.py (self.resume and the path to the saved model and optimizer).
You have to modify the code to implement that feature.)

ghost commented 5 years ago

Oh. I got it. Thanks a lot~

hello,I don't know how to cacluate RMSE loss ,can you tell me explicitly? Thanks

ghost commented 4 years ago

Hello, Can anyone tell me where can we get the path lengths and the speeds of the various sequences in the KITTI color dataset used? The authors have presented their analysis based on it but I could not find the lengths and speeds of the trajectories. I would want to verify their claims on my results as hello,have you found how to use to calculate mse_rotate and mse_translate the same as the result in the deepVO paper

ghost commented 4 years ago

Hello, Can anyone tell me where can we get the path lengths and the speeds of the various sequences in the KITTI color dataset used? The authors have presented their analysis based on it but I could not find the lengths and speeds of the trajectories. I would want to verify their claims on my results as well.

hello, have you found how to use to calculate mse_rotate and mse_translate the same as the result in the deepVO paper

Terry-cyx commented 4 years ago

Sorry to bother you again~ I run train.py using pretrained flownet "pretrained/flownets_bn_EPE2.459.pth.tar". I got the training loss like this. The training loss and valid loss is so small. I want to ask if it is normal?

Hi, sorry to bother you. I just want to know how you can got the result that the loss is so low. I used the weights of the model provided by alexart13, but only got 0.04 in train loss and 0.05 in valid loss. Maybe it's because I can not use the weights of the optimizer due to the vision of pytorch, but I see you just use pretrained flownet weights to get a better result than me, so I'm wondering if you can say some details about your training?(For example, what measures or what changes? )

RandyChen233 commented 1 year ago

@sunnyHelen Hi, I wonder how you were able to train the model in main.py ? When I ran the code I noticed it took over 30 mins to run a single episode... and I do have CUDA available since I have an NVIDIA GPU. I am not sure what's causing the slow runtime

ChiWeiHsiao / DeepVO-pytorch

Performance #2