EryiXie / PlaneRecNet

This is an official implementation for "PlaneRecNet" (BMVC 2021).
MIT License
79 stars 17 forks source link

Does this use ScanNetv2 dataset? #2

Closed uyoung-jeong closed 2 years ago

uyoung-jeong commented 2 years ago

I tried to train the model on ScanNet v2 dataset, and found out that the dataset loading process does not match with the dataset format. Here are what I found:

  1. When I extract rgb and depth files, the rgb files are placed as {root}/scene{xxxx}_{xx}/color/{xxxx}.jpg. However, your annotation file assumes additional directory 'frame': {root}/scene{xxxx}_{xx}/frame/color/{xxxx}.jpg I extracted the files using official ScanNet code (python 2.x).

  2. Camera intrinsic file path is also different. My intrinsic file is stored as {root}/scene{xxxx}_{xx}/intrinsic/intrinsic_{color/depth}.txt In your code, however, it is {root}/scene{xxxx}_{xx}/frame/intrinsic/scene{xxxx}_{xx}.txt Also, my intrinsic file contains 4x4 matrix, so the line index does not exceed 4. But your code reads 9th line of the intrinsic file.

  3. After fixing above path and format problems, I ran into another problem. At 89th line of data/datasets.py file, the mask size does not match with the rgb image size. https://github.com/EryiXie/PlaneRecNet/blob/a1796c888d08bd74a30ff81abdb3cafe9ea7e88a/data/datasets.py#L89 My extracted rgb image has 1296x968 size, while depth image has 640x480 size. The mask size is 307200(=640x480), and don't know why this error happens.

Do your code use older version(v1?) of ScanNet? Or did I miss something during preprocessing? I did not use ScanNet dataset before, so I might have made a mistake. Thanks.

EryiXie commented 2 years ago

Hi, first of all, thank you for pointing out these paths and naming issues, and sorry that you have to counter these issues.

Due to the reason that the official ScanNet code for extracting data from *.sens file is too slow. I believe at that time I did some modifications and resulted in the incompatible naming and pathing issue. And of course, I wanted to make the pathing rule the same as PlaneRCNN.

For the 3rd issue, yes the dataset we use is ScanNet v2, the RGB image is resized to (480x640) so that it is the same as the depth map and the plane annotation given by PlaneRCNN (both are 480x640). At this point, my suggestion is that you can write a simple script to resize the RGB images.

I will try to write a script for data preprocessing, so that everyone can simply run it, after extracting the data using the ScanNet original script. Before I finish it, I will link this issue to the README file. Thank you again.

uyoung-jeong commented 2 years ago

Thanks for the answer. I fixed above mentioned problems, but still I faced several problems. In order to run with RTX 3090 or A6000 gpus, pytorch version should be at least 1.7.0 due to cuda issue. I am currently using pytorch 1.10

1. https://github.com/EryiXie/PlaneRecNet/blob/534e23e6c5db2235ab1e5a9419fb4bfec3ffa943/train.py#L291 I added generator argument in the above command as below: generator=torch.Generator(device='cuda') But the code cannot run on multiple GPUs. I guess that custom DataParallel cannot solve this issue. DistributedDataParallel should be employed. I'm just running with single GPU to prevent errors.

2. https://github.com/EryiXie/PlaneRecNet/blob/534e23e6c5db2235ab1e5a9419fb4bfec3ffa943/planerecnet.py#L492 This line raises an inplace operation error: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8, 128, 120, 160]], which is output 0 of ReluBackward0, is at version 3; expected version 0 instead.

I solved the problem by replacing the line with below: feature_add_all_level = feature_add_all_level.clone() + self.convs_all_levels[i](mask_feat)

3. In order to train your model from scratch, your code requires pretrained resnet weights from YOLACT: resnet101_reducedfc.pth or resnet50-19c8e357.pth.

4. I ran the evaluation code using your pretrained model PlaneRecNet_101_9_125000.pth. The evaluation result seems different from your paper. ` | all | .50 | .55 | .60 | .65 | .70 | .75 | .80 | .85 | .90 | .95 | -------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+ box | 43.93 | 50.27 | 50.04 | 49.79 | 49.41 | 48.92 | 47.77 | 46.40 | 43.75 | 37.83 | 15.15 | mask | 41.58 | 50.32 | 50.26 | 50.21 | 50.10 | 49.92 | 49.47 | 48.33 | 43.77 | 21.23 | 2.22 | -------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+

Depth Metrics: abs_rel: 0.07233, sq_rel: 0.01839, rmse: 0.16482, log10: 0.03036, a1: 0.95648, a2: 0.99420, a3: 0.99875 ratio: 0.93974 ` My preprocessing procedure is not exactly the same as yours. For example, I used intrinsic of rgb camera.

EryiXie commented 2 years ago

Hi uyoung-jeong, sorry for the delay.

For the 3rd issue, yes the Resnet weights are given by the author of YOLACT, but these are the Resnet weights trained on the ImageNet dataset, just like the standard, not pretrained Yolact. So I think ... it is fine, right? (Anyway, Yolact is the first instance segmentation method I read and learned line by line, and in my opinion it is a well-implemented one, so I adapted a lot of their code, maybe I should reimplement the ResNet + DCNv2 part by myself and use a more common version of the pretrained weight...)

For the 4th issue. Uhmm, then the result looks better than reported in the paper. Hahaha. Well, I believe I uploaded the wrong sample or more exactly missed one sample (there are one for training, one for validation, and also the one for evaluation), I will try to find the evaluation sample, I used in the paper, and upload it.

Thank you for mentioning the 1st and 2nd issues, the 1st issue happens on RTX 3090 or A6000 with Pytorch > 1.9 (as far as I know), which I also countered with the same setup on my server. But adding "generator=torch.Generator(device='cuda')" makes it not runnable on my local pc with an older version of PyTorch and an older GPU. I will make a comment on this line, and explain it in a later update.

And the 2nd issue is a really new thing that I didn't know before. Thank you again!

uyoung-jeong commented 2 years ago

Thanks for the answer. I thought that you did not mention about the pretrained weights for the model, but it seems that I did not read the readme.md carefully. Thanks for the kind answer. 2nd issue does not appear when pytorch version is older than 1.8. So, if I just stick to the older version, distributed training problems and inplace operation problems can be ignored. However, as far as I know, older version of pytorch does not support cuda 11.x. It is possible that current script uses same validation set for both training validation and evaluation. If you provide instructions about evaluation, it would be grateful.

nku-zhichengzhang commented 2 years ago

Hi uyoung-jeong, sorry for the delay.

For the 3rd issue, yes the Resnet weights are given by the author of YOLACT, but these are the Resnet weights trained on the ImageNet dataset, just like the standard, not pretrained Yolact. So I think ... it is fine, right? (Anyway, Yolact is the first instance segmentation method I read and learned line by line, and in my opinion it is a well-implemented one, so I adapted a lot of their code, maybe I should reimplement the ResNet + DCNv2 part by myself and use a more common version of the pretrained weight...)

For the 4th issue. Uhmm, then the result looks better than reported in the paper. Hahaha. Well, I believe I uploaded the wrong sample or more exactly missed one sample (there are one for training, one for validation, and also the one for evaluation), I will try to find the evaluation sample, I used in the paper, and upload it.

Thank you for mentioning the 1st and 2nd issues, the 1st issue happens on RTX 3090 or A6000 with Pytorch > 1.9 (as far as I know), which I also countered with the same setup on my server. But adding "generator=torch.Generator(device='cuda')" makes it not runnable on my local pc with an older version of PyTorch and an older GPU. I will make a comment on this line, and explain it in a later update.

And the 2nd issue is a really new thing that I didn't know before. Thank you again!

Thanks for your hard work and kind response. I meet the same issues as uyoung-jeong for data preparation, since your work is significant to the 3d plane community while using a unique data loader.

Could you kindly release the structure of the dataset? or share the file of the dataset via cloud drive?

Thank you for releasing the code and providing detailed descriptions. Btw, your readme is literally informative and detailed one comparing to others :)