How to get the experiment result on Occlusion LineMOD (LM-O) dataset ?

ZJU-PLP commented 2 years ago

@CavendyHsu Hi, zelin

Cloud youd mind sharing the details to get experiment result on Occlusion LineMOD (LM-O) dataset ? I want to reproduce the experiment result from your paper while cannot find the details in your released code.

TerenceHsu666 commented 2 years ago

@ZJU-PLP Thank you for your interest in our work. We have updated the code and pre-trained models for the LMO dataset.

ZJU-PLP commented 2 years ago

@CavendyHsu Ok, Thanks for your reply.

I meet the problem when I want to download the Pred mask of HybridPose on LineMOD Occlusion dataset.

TerenceHsu666 commented 2 years ago

Access is turned on.

ZJU-PLP commented 2 years ago

@CavendyHsu The file can be downloaded now. Thanks a lot.

ZJU-PLP commented 1 year ago

@CavendyHsu @aragakiyui611 Hi, Thanks for sharing the code to train and eval lmo dataset. However, I cannot reproduce your paper's results. Could you mind offering some advice?

I run the script and pretrained model is provided by your download link.

python eval_lmo.py --dataset_root ./datasets/linemod/Linemod_preprocessed --pred_mask ./mask_hybridpose --model ./pretrained_models/lmo/pose_model_27_0.02377143515188488.pth

However, I get the ADD(S) Result: 65.91 instead of 69.5 in your paper.

I run the lmo dataset train script offered by your method. However, I cannot reproduce the good experiment results.(ADD(S) Result is only 25.15 and the loss cannot be refined)

python train.py --dataset lmo --dataset_root path_to_lm_dataset --bg_img path_to_voc2012_dataset

yunnan66 commented 1 year ago

@ZJU-PLP Hi. When I was running the LMO dataset, I encountered the following problem. Do you have a solution? I am looking forward to your reply. 截图 2022-09-20 17-25-24(1)

ZJU-PLP commented 1 year ago

@ZJU-PLP Hi. When I was running the LMO dataset, I encountered the following problem. Do you have a solution? I am looking forward to your reply.

@yunnan66 No, I did not meet the problem. Maybe your torchvision version is not supported (from your debug error infromation)?

aragakiyui611 commented 1 year ago

Hi, I tested the official weights of LMO and it is exactly 69.52, I am using torch1.8.0. Please check whether is the testing data problem or torch version

ZJU-PLP commented 1 year ago

@aragakiyui611 I have tested again (pytorch version changed to 1.8) and the result is still 65.62. Could you mind sharing your script bash command ?

aragakiyui611 commented 1 year ago

Please refer to these. I just simply run python eval_lmo.py

Pray check any modification on the dataset or network code. Meanwhile I notice that it spent almost 24 min to test on lmo on your machine, while I only spent 8 min, I use 3090. Similarly, your training time is quite long. I guess the bottleneck may be system IO instead of GPU.

ZJU-PLP commented 1 year ago

@aragakiyui611 I have tested the code by "python eval_lmo.py --dataset_root ./datasets/linemod/Linemod_preprocessed --pred_mask ./mask_hybridpose --model ./pretrained_models/lmo/pose_model_27_0.02377143515188488.pth" and results is 65.78 in my lab server again (instead of my local desktop computer). The GPU is GTX 2080Ti.

I think your analysis is very right about system IO instead of GPU. However, I cannot understand why the same model produce different results. Because system IO only influence running time while not the precision results. I want to know whether you test the code in Linemod_preprocessed dataset?

aragakiyui611 commented 1 year ago

I want to know whether you test the code in Linemod_preprocessed dataset.

Not quite understand, the 02 of Linemod_preprocessed is exactly Occluded LineMOD.

However, I cannot understand why the same model produce different results.

This is very normal and there are so many factors affecting. You could probably search zhihu or others for detailed explanation.

As far as the result you run 65.78 and my 69.52, I can't find the exact where the problem is. Pray check whether you modified you test data.

ZJU-PLP commented 1 year ago

@aragakiyui611 Hi, I have checked again LineMod dataset and found some error. Lately, I have fixed it and tested again your provided trained model and the results is ok (dataset moved to SSD).

Today, I have trained lmo dataset again in my local computer. However, I find that the loss cannot be refined after epoch=25(training --nepoch=50). And the test ADD(S) results is only 31.89(model epoch=25).Could you mins sharing your training curve displayed with tensorboard?

Training details: script command python train.py --dataset lmo --dataset_root ./datasets/linemod/Linemod_preprocessed --bg_img ./datasets/lmo/VOCdevkit

Testing details:

ZJU-PLP commented 1 year ago

@aragakiyui611 Could you mind helping me to solve this problem when you are leisure?

aragakiyui611 commented 1 year ago

@ZJU-PLP exp.zip These are training logs. If you need weights I may share it with other methods since they are too large. I am afraid you have to figure out problem yourself first, I have not time in the recent month.

ZJU-PLP commented 1 year ago

@aragakiyui611 Thanks for your nice sharing. I find that my TEST FINISH Avg dis is different from yours to a great extent. I'll keep on debug the results.

ZJU-PLP commented 1 year ago

@aragakiyui611 I maybe find the key of this problem.

In your provided trained logs, the test result is different by bgaug=True-front_num=3-num_pts=1000-box_aug=False and bgaug=True-front_num=3-num_pts=1000-box_aug=True. From your provided trained model, the model of pose_model_27_0.02377143515188488.pth is trained by box_aug=True.

Could you mind helping me to understand the parameter of box_aug? In your lmo dataset training script from README, I cannot find the related parameter.

epoch_27_test_log.txt: (1) box_aug=False Avg dis: 0.03193115701012664 (2) box_aug=True Avg dis: 0.02377143515188488

aragakiyui611 commented 1 year ago

it makes background augmentation. please refer to these lines: background aug:

https://github.com/Gorilla-Lab-SCUT/BiCo-Net/blob/68bc075673f6696c0e4f18a67c5cd9be06900312/datasets/lmo/dataset.py#L105

bbox aug:

https://github.com/Gorilla-Lab-SCUT/BiCo-Net/blob/68bc075673f6696c0e4f18a67c5cd9be06900312/datasets/lmo/dataset.py#L129

ZJU-PLP commented 1 year ago

it makes augmentation on detection bbox. please refer to these lines:

https://github.com/Gorilla-Lab-SCUT/BiCo-Net/blob/68bc075673f6696c0e4f18a67c5cd9be06900312/datasets/lmo/dataset.py#L129

Ok, I see. While the parameters of bgaug=True-front_num=3-num_pts=1000 stand for (bgaug, front_num and num_pts)?

aragakiyui611 commented 1 year ago

I made a mistake, please refer to the edited comment. bgaug=True-front_num=3-num_pts=1000 is default.

ZJU-PLP commented 1 year ago

I made a mistake, please refer to the edited comment. bgaug=True-front_num=3-num_pts=1000 is default.

In your meanings, I can ignore the parameters(bgaug, front_num and num_pts) when to train lmo dataset?

ZJU-PLP commented 1 year ago

@aragakiyui611 I trained several times in lmo dataset by your code. However, my train results cannot achieve your provided model. Could you mind helping me to train your published code again in lmo dataset by script python train.py --dataset lmo --dataset_root path_to_lm_dataset --bg_img path_to_voc2012_dataset (If it is convenient for you)? I am afraid that the published code maybe miss some key part. In other words, I can make sure that the problem location if you train the published code (not your initial code)and can achieve the result pose_model_27_0.02377143515188488.pth.

My training results:

aragakiyui611 commented 1 year ago

https://github.com/Gorilla-Lab-SCUT/BiCo-Net/blob/68bc075673f6696c0e4f18a67c5cd9be06900312/datasets/lmo/dataset.py#L37

I found that it may be this bug, I have updated the code

ZJU-PLP commented 1 year ago

https://github.com/Gorilla-Lab-SCUT/BiCo-Net/blob/68bc075673f6696c0e4f18a67c5cd9be06900312/datasets/lmo/dataset.py#L37

I found that it may be this bug, I have updated the code

Ok, thanks a lot. I am training again now.

aragakiyui611 commented 1 year ago

I trained the model as well and there're still some bug.

ZJU-PLP commented 1 year ago

I trained the model as well and there're still some bug.

I also trained again. It still exists some bugs.

ZJU-PLP commented 1 year ago

@aragakiyui611 Could you mind helping me to solve this problem in lmo dataset when you are leisure? Maybe you can look up the initial version of experiment code which tested by the one of yours.

ZJU-PLP commented 1 year ago

@TerenceHsu666 Could you mind helping me to solve this problem in lmo dataset when you are leisure?

aragakiyui611 commented 1 year ago

https://github.com/Gorilla-Lab-SCUT/BiCo-Net/blob/581671b554d5b33a17d69c45bd13d841cb0790e8/datasets/lmo/dataset.py#L130

Hi, please try change this line into img_masked = np.transpose(img_masked, (2, 0, 1)). I am training the model again. It seems that this is the bug.

ZJU-PLP commented 1 year ago

https://github.com/Gorilla-Lab-SCUT/BiCo-Net/blob/581671b554d5b33a17d69c45bd13d841cb0790e8/datasets/lmo/dataset.py#L130

Hi, please try change this line into img_masked = np.transpose(img_masked, (2, 0, 1)). I am training the model again. It seems that this is the bug.

Okkk, thanks a lot.

ZJU-PLP commented 1 year ago

@aragakiyui611 I find the bug is fixed by your proposed method. The testing results is ok. Thanks for your patient response.

Gorilla-Lab-SCUT / BiCo-Net

How to get the experiment result on Occlusion LineMOD (LM-O) dataset ? #2