facebookresearch / meshrcnn

code for Mesh R-CNN, ICCV 2019
Other
1.13k stars 173 forks source link

Difficulty reproducing the paper publication's results #96

Closed samiwilf closed 3 years ago

samiwilf commented 3 years ago

I am having trouble reproducing the exact results shown in the Mesh RCNN paper publication (https://arxiv.org/pdf/1906.02739.pdf) when running: python demo/demo.py --config-file configs/pix3d/meshrcnn_R50_FPN.yaml --input datasets/pix3d/img/chair/0213.jpg --output output_demo_chair_s1 --onlyhighest MODEL.WEIGHTS meshrcnn://meshrcnn_R50.pth

The original 0213.jpg image is provided at the bottom. demo.py ran locally produces results that are not identical to the paper. The holes in the chairs' masks and the meshes are visibly different, as shown below.

Investigating further, I downloaded the s1 and s2 weights to my local computer and tried both. I downloaded the weights from: https://github.com/facebookresearch/meshrcnn/blob/master/INSTRUCTIONS_PIX3D.md . The purpose was to rule out any issue that using meshrcnn://meshrcnn_R50.pth caused. (meshrcnn://meshrcnn_R50.pth is a special path causing demo.py to download and use the s1 pretrained weights, based on the documentation and the meshrcnn code I've viewed). Using the downloaded s1 pretrained weights produced identical results to meshrcnn://meshrcnn_R50.pth, as expected. The results of using the s1 and s2 pretrained weights are shown below.
Boundingbox_and_mask_comparison_labeled Left image and center image are generated locally running demo.py. (The exact commands executed are shown throughout this post). Right image is a screenshot from the Mesh RCNN paper publication, page 12.
Mesh_comparison_s1_weights_labeled Left mesh is generated locally using s1 weights when running the demo. Right mesh is a screenshot of the paper, page 12. Example of the exact command I executed in the terminal: python demo/demo.py --config-file configs/pix3d/meshrcnn_R50_FPN.yaml --input datasets/pix3d/img/chair/0213.jpg --output output_demo_chair_s1 --onlyhighest MODEL.WEIGHTS s1_weights/Meshrcnn_R50.pth
Mesh_comparison_s2_weights_labeled Left mesh is generated locally using the s2 weights when running the demo. Right mesh is a screenshot of the paper, page 12. The exact command I executed in the terminal: python demo/demo.py --config-file configs/pix3d/meshrcnn_R50_FPN.yaml --input datasets/pix3d/img/chair/0213.jpg --output output_demo_chair_s2 --onlyhighest MODEL.WEIGHTS s2_weights/meshrcnn_S2_R50.pth
I run each command above with --onlyhighest and without --onlyhighest. It makes no difference; the chair's mesh/mask is not the same as the paper.

Regarding my environment, here's my output when running pip list:

Package Version Location


absl-py 0.12.0
appdirs 1.4.3
CacheControl 0.12.6
cachetools 4.2.1
certifi 2019.11.28
chardet 3.0.4
cloudpickle 1.6.0
colorama 0.4.3
contextlib2 0.6.0
cycler 0.10.0
Cython 0.29.22
detectron2 0.4+cu110
distlib 0.3.0
distro 1.4.0
future 0.18.2
fvcore 0.1.3.post20210317 google-auth 1.28.0
google-auth-oauthlib 0.4.4
grpcio 1.37.0
html5lib 1.0.1
idna 2.8
iopath 0.1.8
ipaddr 2.2.0
kiwisolver 1.3.1
lockfile 0.12.2
Markdown 3.3.4
matplotlib 3.4.1
meshrcnn 1.0 /home/lambdauser/meshrcnn_poc/meshrcnn msgpack 0.6.2
numpy 1.20.2
oauthlib 3.1.0
omegaconf 2.0.6
opencv-python 4.5.1.48
packaging 20.3
pep517 0.8.2
Pillow 8.2.0
pip 20.0.2
pkg-resources 0.0.0
portalocker 2.3.0
progress 1.5
protobuf 3.15.7
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycocotools 2.0.2
pydot 1.4.2
pyparsing 2.4.6
python-dateutil 2.8.1
pytoml 0.1.21
pytorch3d 0.4.0
PyYAML 5.4.1
requests 2.22.0
requests-oauthlib 1.3.0
retrying 1.3.3
rsa 4.7.2
scipy 1.6.2
setuptools 44.0.0
six 1.14.0
tabulate 0.8.9
tensorboard 2.4.1
tensorboard-plugin-wit 1.8.0
termcolor 1.1.0
torch 1.7.1+cu110
torchvision 0.8.2+cu110
tqdm 4.60.0
typing-extensions 3.7.4.3
urllib3 1.25.8
webencodings 0.5.1
Werkzeug 1.0.1
wheel 0.34.2
yacs 0.1.8


The obj and jpg files produced when running the demo using the s1 weights and when running it using the s2 weights are here: inference_outputs.zip


The original input image from the pix3d dataset, (the original location of the below image after downloading/decompressing the pix3d dataset zip file is: meshrcnn_repo_root / datasets / pix3d / img / chair / 0213.jpg) 0213

I use the terms "pretrained weights" and "weights" interchangeably above. (Noting this for clarity's sake.)

Would anyone happens to know why I cannot reproduce the paper's visual results exactly?

gkioxari commented 3 years ago

I don't understand your confusion. The pink overlaid silhouettes in the paper (and your screenshots) are 2D masks (from the mask head of Mesh R-CNN). The rendered 3D shapes (as predicted from the mesh refinement head of Mesh R-CNN) from the paper and what you show look similar. What exactly is your issue?

samiwilf commented 3 years ago

I don't understand your confusion. The pink overlaid silhouettes in the paper (and your screenshots) are 2D masks (from the mask head of Mesh R-CNN). The rendered 3D shapes (as predicted from the mesh refinement head of Mesh R-CNN) from the paper and what you show look similar. What exactly is your issue?

I understand the results are similar. But I'm trying to understand why they are not identical. I'd appreciate any help Georgia. Thank you for the fast reply.

gkioxari commented 3 years ago

You expect identical outputs at the pixel level which means identical models numerically for all model parameters, even after through years of PyTorch, PyTorch3D & CUDA development? That's not possible. All papers and their published models (especially from many years go, like this one) will differ if you retrain them or retest them due to numerical changes coming from changes in various source code and software. What you should be looking for is for very similar results and robustness.

samiwilf commented 3 years ago

I meant visually identical or close enough to be negligible. The differences here are not negligible and you did not state what versions of what software you used to let me test whether you're indeed right in your assumptions/speculation as to the reason to the differences I'm seeing.