Closed Bozcomlekci closed 5 months ago
Hi Batuhan,
In the past months several interested users have confirmed to us that they were able to reproduce our results, so I believe there could be an issue with your evaluation setup. In order for us to be able to help and understand the issue better, would it be possible for you to share with us the RGB-D image dimensions of the ScanNet images you are using, as well as the camera intrinsics? Furthermore, have you followed the pre-processing step from our README (titled Step 1: Download and pre-process the ScanNet200 dataset)? This could be important for ensuring that mask proposals are the same as in our experiments. If you confirm that you have closely followed the pre-processing step, it would be very helpful for us to see a visualization of the predicted instance masks, to see if there is an issue with the mask proposals, or the feature computation stage. If the masks seem reasonable, I recommend you to enable the "save_crops" option in our config, as this will help with visually inspecting whether the image crops are correct/reasonable.
Best, Ayca
For example, in the scene scene0011_00
,
RGB-D image dimensions of the ScanNet images are like:
data_compressed/color/*.jpg
: 1296 × 968 pixels
data_compressed/depth/*.png
: 640 × 480 pixels
Intrinsics:
data/intrinsic/intrinsic_color.txt
:
1169.621094 0.000000 646.295044 0.000000
0.000000 1167.105103 489.927032 0.000000
0.000000 0.000000 1.000000 0.000000
0.000000 0.000000 0.000000 1.000000
data/intrinsic/intrinsic_depth.txt
:
577.590698 0.000000 318.905426 0.000000
0.000000 578.729797 242.683609 0.000000
0.000000 0.000000 1.000000 0.000000
0.000000 0.000000 0.000000 1.000000
data/pose/*.txt
:
0.606497 0.359513 -0.709163 5.898605
0.793947 -0.321582 0.515978 1.464963
-0.042553 -0.875977 -0.480473 1.329018
0.000000 0.000000 0.000000 1.000000
The extrinsics are identity.
I am using the point cloud file scene0011_00_vh_clean_2.ply
The evaluation scans are under a folder named scans, i.e. dataset/scans/scene0011_00
, and scene0011_00
contains the structure mentioned in the "Step 2: Check the format of ScanNet200 dataset" part of your repo as well as the raw .sens files that the RGB-D data extracted from. The dataset/scans
folder also contains the training scans but their RGB-D data is not available inside their scene folders as I don't use them in the validation.
data/processed/scannet <- the ScanNet200 pre-processing out folder
├── instance_gt
│ ├── train
│ │ ├── scene0000_00.txt
│ │ ...
│ ├── validation
│ │ ├── scene0011_00.txt
│ │ ...
├── train
│ ├── 0000_00.npy
│ ...
├── validation
│ ├── 0011_00.npy
│ ...
└── color_mean_std.yaml <- mean: - 0.478 - 0.430 - 0.375 std: - 0.283 - 0.276 - 0.270 (3 sig. fig.)
└── label_database.yaml <- 1400 lines of text starting with 1: color: - 174.0 - 199.0 - 232.0 name: wall validation: true
└── train_database.yaml <- 22819 lines of text containing color_mean, color_std, file_len, filepath, ..., scene, scene_type, subscene for each scene entry
└── train_validation_database.yaml <- 28747 lines of text
└── validation_database.yaml <- 5928 lines of text
The mask proposal stage yields 148 masks for the provided example scene where I visualized each mask with a different color in the figure below.
Furthermore, save_crops
yields reasonable crops.
I still have no idea where the evaluation fails to calculate correct results. I suspect data/processed/scannet
part as I am able to obtain reasonable results for the inference of a single scene.
If you could share your table printout for the evaluation, the output of run_eval_close_vocab_inst_seg.py
it would be helpful. I'm obtaining a lot of nans and zeros in the table which I believe shouldn't be the case in a correct eval run.
Thanks.
Hey, @Bozcomlekci. I am also getting the same score on scannet200 for some reason, and 9.5mAP on replica. Could you please share if you figure out what the problem might be?
thanks
Replica results:
Replica results:
################################################################ what : AP AP_50% AP_25% ################################################################ basket : 0.000 0.000 0.264 bed : 0.000 0.000 0.000 bench : 0.000 0.000 0.000 bin : 0.485 0.563 0.565 blanket : 0.000 0.000 0.362 blinds : 0.025 0.062 0.246 book : 0.000 0.000 0.000 bottle : 0.000 0.000 0.000 box : 0.000 0.000 0.000 bowl : 0.000 0.000 0.000 camera : 0.000 0.000 0.000 cabinet : 0.370 0.556 0.556 candle : 0.000 0.000 0.000 chair : 0.376 0.462 0.462 clock : 0.000 0.000 0.188 cloth : 0.116 0.524 0.538 comforter : 0.000 0.000 0.667 cushion : 0.133 0.305 0.477 desk : 0.000 0.000 0.000 desk-organizer : 0.121 0.378 0.378 door : 0.293 0.332 0.478 indoor-plant : 0.044 0.133 0.133 lamp : 0.066 0.073 0.073 monitor : 0.000 0.000 0.000 nightstand : 0.556 0.833 0.833 panel : 0.000 0.000 0.000 picture : 0.375 0.375 0.375 pillar : 0.033 0.150 0.150 pillow : 0.131 0.362 0.564 pipe : 0.000 0.000 0.000 plant-stand : 0.000 0.000 0.000 plate : 0.000 0.000 0.000 pot : 0.460 0.517 0.517 sculpture : 0.061 0.273 0.273 shelf : 0.172 0.516 0.520 sofa : 0.287 0.541 0.544 stool : 0.216 0.216 0.560 switch : 0.000 0.000 0.000 table : 0.084 0.114 0.114 tablet : 0.000 0.000 0.271 tissue-paper : 0.000 0.000 0.000 tv-screen : 0.019 0.171 0.575 tv-stand : 0.000 0.000 0.000 vase : 0.135 0.153 0.324 vent : 0.000 0.000 0.000 wall-plug : 0.000 0.000 0.000 window : 0.000 0.000 0.000 rug : 0.000 0.000 0.000
average : 0.095 0.159 0.229
Hi everyone,
Sorry for the delay - I recently started an internship and I haven't had a lot of time to respond in the past weeks.
@Bozcomlekci I think there is some mismatch in the data. In my data folder for the scene scene0011_00, RGB-D image dimensions of the ScanNet images are like: data_compressed/color/.jpg: 640 × 480 pixels - the example scene also has images of this resolution for me. data_compressed/depth/.png: 640 × 480 pixels
The resolution of the image is different, and as far as I remember we do not resize the images within the script. So this might be messing things up due to the image crops potentially being wrong. They could still look reasonable, but they might not correspond 1-1 to the 3D instances due to this image resolution mismatch. I have been using a version of the ScanNet dataset that I got from a colleague, so maybe they already resized the images to match its size with the depth image resolution.
Regarding the masks, as we noted in the Readme, running OpenMask3D ScanNet evaluation on the first scene gives different results than running it on the example scene, as the ScanNet evaluation uses "eval on segments" configuration closely following Mask3D method, this is why one also needs to run the preprocessing script for Mask3D.
The mask proposal stage yields 165 masks for me for the first scene when one runs the ScanNet evaluation script, using the eval_on_segments option.
Here are the ScanNet200 results I get when I run the eval script: openmask3d_scannet200_final_results.txt
Finally, regarding the NaNs and 0.0s in the evaluation, it is normal to have a few of such lines in the evaluation. NaN means that that category does not exist in Validation set GT. On the other hand, 0.0 means that our model misses to correctly identify any instances that belong to that category.
Hope this helps, Ayca
After changing the "color" image resolution, there is still a small discrepancy between my evaluation results and yours but overall they are aligned.
I would appreciate if you have any other suggestion. Is this discrepancy something expected?
Hey, @Bozcomlekci. I have the correct resolution, and the crops is right, but I get a much different result, can you help me? inst_res.txt Thanks
Hi @hxiaoj and @Bozcomlekci,
For performing multiple rounds of SAM iterations, we do several steps of point sampling. I think in this version of the codebase we do not have a seed for this randomized point sampling process, which might cause small a discrepancy time to time depending on the sampled points that are used as an input to the SAM model. I would not be too surprised if the numbers are within +/- 1-1.5 AP points of what we reported in the paper.
Hope this helps, Ayca
Greetings,
When I run the evaluation code on the ScanNet200 validation set, I think I obtained a different set of results. I didn't change any kind of hyperparameter or given config files except the paths to the correct locations of the data files. I attached the printout of the results.
eval.txt
I used the provided mask proposal model trained on the ScanNet200 training set. I performed the evaluation on the 312 scans belonging to the eval split. Is there something wrong with my evaluation?
Thanks in advance.