Closed ZJU-PLP closed 1 year ago
@ZJU-PLP Thank you for your interest in our work. We have updated the code and pre-trained models for the LMO dataset.
@CavendyHsu Ok, Thanks for your reply.
I meet the problem when I want to download the Pred mask of HybridPose on LineMOD Occlusion dataset.
Access is turned on.
@CavendyHsu The file can be downloaded now. Thanks a lot.
@CavendyHsu @aragakiyui611 Hi, Thanks for sharing the code to train and eval lmo dataset. However, I cannot reproduce your paper's results. Could you mind offering some advice?
python eval_lmo.py --dataset_root ./datasets/linemod/Linemod_preprocessed --pred_mask ./mask_hybridpose --model ./pretrained_models/lmo/pose_model_27_0.02377143515188488.pth
However, I get the ADD(S) Result: 65.91 instead of 69.5 in your paper.
python train.py --dataset lmo --dataset_root path_to_lm_dataset --bg_img path_to_voc2012_dataset
@ZJU-PLP Hi. When I was running the LMO dataset, I encountered the following problem. Do you have a solution? I am looking forward to your reply.
@ZJU-PLP Hi. When I was running the LMO dataset, I encountered the following problem. Do you have a solution? I am looking forward to your reply.
@yunnan66 No, I did not meet the problem. Maybe your torchvision version is not supported (from your debug error infromation)?
Hi, I tested the official weights of LMO and it is exactly 69.52, I am using torch1.8.0. Please check whether is the testing data problem or torch version
@aragakiyui611 I have tested again (pytorch version changed to 1.8) and the result is still 65.62. Could you mind sharing your script bash command ?
Please refer to these. I just simply run
python eval_lmo.py
Pray check any modification on the dataset or network code. Meanwhile I notice that it spent almost 24 min to test on lmo on your machine, while I only spent 8 min, I use 3090. Similarly, your training time is quite long. I guess the bottleneck may be system IO instead of GPU.
@aragakiyui611 I have tested the code by "python eval_lmo.py --dataset_root ./datasets/linemod/Linemod_preprocessed --pred_mask ./mask_hybridpose --model ./pretrained_models/lmo/pose_model_27_0.02377143515188488.pth" and results is 65.78 in my lab server again (instead of my local desktop computer). The GPU is GTX 2080Ti.
I think your analysis is very right about system IO instead of GPU. However, I cannot understand why the same model produce different results. Because system IO only influence running time while not the precision results. I want to know whether you test the code in Linemod_preprocessed dataset?
I want to know whether you test the code in Linemod_preprocessed dataset.
Not quite understand, the 02 of Linemod_preprocessed is exactly Occluded LineMOD.
However, I cannot understand why the same model produce different results.
This is very normal and there are so many factors affecting. You could probably search zhihu or others for detailed explanation.
As far as the result you run 65.78 and my 69.52, I can't find the exact where the problem is. Pray check whether you modified you test data.
@aragakiyui611 Hi, I have checked again LineMod dataset and found some error. Lately, I have fixed it and tested again your provided trained model and the results is ok (dataset moved to SSD).
Today, I have trained lmo dataset again in my local computer. However, I find that the loss cannot be refined after epoch=25(training --nepoch=50). And the test ADD(S) results is only 31.89(model epoch=25).Could you mins sharing your training curve displayed with tensorboard?
Training details:
script command python train.py --dataset lmo --dataset_root ./datasets/linemod/Linemod_preprocessed --bg_img ./datasets/lmo/VOCdevkit
Testing details:
@aragakiyui611 Could you mind helping me to solve this problem when you are leisure?
@ZJU-PLP exp.zip These are training logs. If you need weights I may share it with other methods since they are too large. I am afraid you have to figure out problem yourself first, I have not time in the recent month.
@aragakiyui611 Thanks for your nice sharing. I find that my TEST FINISH Avg dis is different from yours to a great extent. I'll keep on debug the results.
@aragakiyui611 I maybe find the key of this problem.
In your provided trained logs, the test result is different by bgaug=True-front_num=3-num_pts=1000-box_aug=False and bgaug=True-front_num=3-num_pts=1000-box_aug=True. From your provided trained model, the model of pose_model_27_0.02377143515188488.pth is trained by box_aug=True.
Could you mind helping me to understand the parameter of box_aug? In your lmo dataset training script from README, I cannot find the related parameter.
epoch_27_test_log.txt: (1) box_aug=False Avg dis: 0.03193115701012664 (2) box_aug=True Avg dis: 0.02377143515188488
it makes background augmentation. please refer to these lines: background aug:
bbox aug:
it makes augmentation on detection bbox. please refer to these lines:
Ok, I see. While the parameters of bgaug=True-front_num=3-num_pts=1000 stand for (bgaug, front_num and num_pts)?
I made a mistake, please refer to the edited comment.
bgaug=True-front_num=3-num_pts=1000
is default.
I made a mistake, please refer to the edited comment.
bgaug=True-front_num=3-num_pts=1000
is default.
In your meanings, I can ignore the parameters(bgaug, front_num and num_pts) when to train lmo dataset?
@aragakiyui611
I trained several times in lmo dataset by your code. However, my train results cannot achieve your provided model. Could you mind helping me to train your published code again in lmo dataset by script python train.py --dataset lmo --dataset_root path_to_lm_dataset --bg_img path_to_voc2012_dataset
(If it is convenient for you)? I am afraid that the published code maybe miss some key part.
In other words, I can make sure that the problem location if you train the published code (not your initial code)and can achieve the result pose_model_27_0.02377143515188488.pth.
My training results:
I found that it may be this bug, I have updated the code
I found that it may be this bug, I have updated the code
Ok, thanks a lot. I am training again now.
I trained the model as well and there're still some bug.
I trained the model as well and there're still some bug.
I also trained again. It still exists some bugs.
@aragakiyui611 Could you mind helping me to solve this problem in lmo dataset when you are leisure? Maybe you can look up the initial version of experiment code which tested by the one of yours.
@TerenceHsu666 Could you mind helping me to solve this problem in lmo dataset when you are leisure?
Hi, please try change this line into img_masked = np.transpose(img_masked, (2, 0, 1))
.
I am training the model again.
It seems that this is the bug.
Hi, please try change this line into
img_masked = np.transpose(img_masked, (2, 0, 1))
. I am training the model again. It seems that this is the bug.
Okkk, thanks a lot.
@aragakiyui611 I find the bug is fixed by your proposed method. The testing results is ok. Thanks for your patient response.
@CavendyHsu Hi, zelin
Cloud youd mind sharing the details to get experiment result on Occlusion LineMOD (LM-O) dataset ? I want to reproduce the experiment result from your paper while cannot find the details in your released code.