chengzhag / Implicit3DUnderstanding

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021). Also includes a PyTorch implementation of the decoder of LDIF (from 3D Shape Representation with Local Deep Implicit Functions).
https://chengzhag.github.io/publication/im3d/
MIT License
200 stars 34 forks source link

Training with self processed SUNRGBD dataset #11

Closed mertkiray closed 3 years ago

mertkiray commented 3 years ago

I want to evaluate 3D detection, so I preprocessed SUNRGBD as stated in the README. While training the logs say that 180/10450 missing samples 80/5014 missing samples

Did I make a mistake while preprocessing or is this expected?

Thanks :)

chengzhag commented 3 years ago

Hi @mertkiray,

This is actually expected.

When trying the processing code of Total3D, we find that the processed results have fewer samples than the downloaded pre-processed data from Total3D. Since the pre-processing code from Total3D filters out some of the scenes based on several rules, we suspect that their code might have some change after the release which caused more scenes to be ignored.

Since our pre-processing code is based on Total3D's released code, we inherit their train/test set of the dataset processed with it. The missing samples should not cause a large difference in the evaluation results.

mertkiray commented 3 years ago

Hi @chengzhag, Thank you for the detailed answer. I am asking because the model I trained seems to differ on 3D Object detection results from the paper.

IMG_20210624_050812

Top row is from the paper while the bottom row is from the output of training.

Also I have similar issues with the visualizations differing from the visualizations from paper results.

image

IMG_20210624_052514

I wanted to make sure this is not because of preprocessing. Do you have any idea why these results are different as well as visualizations?

Also how can I get other test results you provided in the paper like Layout IoU, cam pitch, cam roll and chamfer distance.

Thank you so much 😊

chengzhag commented 3 years ago

Hi @mertkiray,

The LDIF does seem to break on some of the objects with thin structures, especially when generalizing from Pix3D to SUN RGB-D.

The other metrics can be checked after the training from the log file in the experiment folder or can be better shown from wandb experiment page (you can create a report with tables or graphs like below)

Scene understanding: image image

Single object reconstruction: image image

Would you please check the training and testing curve to see if there is any problem?

mertkiray commented 3 years ago

Thank you @chengzhag for sharing other metrics, I am sharing my training and testing curves below.

This is LEN training/testing curves : image

This is LIEN + LDIF: image

This is SGCN: image

This is joint: image

Do you see any problems with this curves?

chengzhag commented 3 years ago

Hi, @mertkiray,

I presume that the rest of the metrics logged with wandb are OK?

I see that your curves are quite similar to what I have got. From my observation, It does require some luck (or several attempts) to achieve the results of our provided checkpoint, since there might be some randomness during the training process and the checkpoint we provided is the best we can get.

mertkiray commented 3 years ago

In LEN logs: Test loss (layout_iou): 0.620150 Test loss (cam_pitch_err): 3.728792 Test loss (cam_roll_err): 2.606081

IN LIEN+LDIF logs: Test loss (Avg_Chamfer): 0.008465 {'bed': 0.005958032906252605, 'wardrobe': 0.0031370156852062792, 'bookcase': 0.004260495355346292, 'chair': 0.007310174998500854, 'sofa': 0.003096780832186188, 'table': 0.017010838347444775, 'desk': 0.013083522971386888, 'misc': 0.02592532049439957, 'tool': 0.0029951145195148206} Test loss (chamfer_woICP): 8.669688 {'bed': 6.097681875626751, 'wardrobe': 3.2926102414548657, 'bookcase': 4.573465430221525, 'chair': 7.50823779989532, 'sofa': 3.34066853316409, 'table': 17.324360070516153, 'desk': 12.981401550831366, 'misc': 26.215719928701777, 'tool': 3.0616522062591085} Test loss (chamfer_wICP): 7.467241 {'bed': 4.969611945433475, 'wardrobe': 3.026142153369496, 'bookcase': 3.9906250437764443, 'chair': 6.427196312469815, 'sofa': 3.6129484184597493, 'table': 14.570997367973707, 'desk': 10.58210799721211, 'misc': 22.70249547140817, 'tool': 2.7223403830129893}

In SGCN logs: Test loss (layout_iou): 0.637629 Test loss (cam_pitch_err): 3.061840 Test loss (cam_roll_err): 2.257464 Test loss (iou_3d): 0.198359 Test loss (iou_2d): 0.666520

In JOINT logs: Test loss (layout_iou): 0.633157 Test loss (cam_pitch_err): 3.076085 Test loss (cam_roll_err): 2.250714 Test loss (iou_3d): 0.208100 Test loss (iou_2d): 0.676601 Test loss (Lg): 1.353826

I think they are close but still bad from the paper results. :) This is my first research paper that I tried to duplicate results and thank you for the amazing guidance :)

Do we have a chance to get the pretrained checkpoints for LEN and LIEN+LDIF?

chengzhag commented 3 years ago

Hi @mertkiray

Your pretrained LEN is actually quite close to mine: Test loss (layout_iou): 0.613996 Test loss (cam_pitch_err): 3.652064 Test loss (cam_roll_err): 2.517752 , which is trained with the parameters provided by Total3D. Since Total3D already provides a pretrained model with similar testing results of LEN: Test loss (layout_iou): 0.599689 Test loss (cam_pitch_err): 3.613924 Test loss (cam_roll_err): 2.455977 (even better on camera pose estimation, may be loaded as an alternative) , there seems no need to provide another LEN pretrained model.

Since the LIEN+LDIF is not finetuned during the joint training, the weights in our provided final model should be identical to the pretrained weights. You can load it to test the Chamfer loss.

mertkiray commented 3 years ago

Hi @chengzhag thank you again :) So there is nothing wrong I assume with my setup and the differences with the paper is just randomness?

chengzhag commented 3 years ago

Hi @mertkiray,

I think yes.

On your demand, I also updated a link to download pretrained LIEN+LDIF model. Hope this can help you.

Update: the link

mertkiray commented 3 years ago

Thank you so much @chengzhag