VinAIResearch / Open3DIS

Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance (CVPR 2024)
https://open3dis.github.io/
Apache License 2.0
69 stars 3 forks source link

Struggle with understanding when / how to run ISBNet and Superpoint #20

Closed knsjoon closed 3 months ago

knsjoon commented 3 months ago

Here are the steps that you have mentioned how to run the code.

1) Extract 2D masks and first stage feature from RGB-D sequences

2) Generate 3D instances from 2D masks

3-A) Refine second stage feature from 3D instances

3-B) After refine grounded features, re-run step 2 to finalize the 3D output masks

4) Run interactive visualization (required Pyviz3D)

I have a question at what stage I need to run ISBNet and SuperPoint for the whole pipeline to run. I managed to run step 1 but I am not too sure if step 2 requires ISBNet result and SuperPoint result.

Also, can I download the latest version of ISBNet and Superpoint? I am wondering as you have mentioned that for segment2d (SAM and GroundingDINO) to stick with the codes that were included in the github repo due to some changes you made, right?

I read your paper and it is a very cool! So I am striving to make the code run the whole pipeline with scannetpp and then with my own dataset afterwards.

PhucNDA commented 3 months ago

Hi @knsjoon,

Thank you for your interest in our work!

For the ScanNetpp 3D data, you can find the latest spp and the pre-computed isbnet_clsagnostic (pretrained200) here.

To generate 3D proposals using the 3D backbone, initiate the process which should compute the entire 50 validation scenes in about 1-2 minutes (you should run before step 3-A). If you wish to customize which stream of proposals to use for final predictions, you can adjust the settings in the configs here.

Should you have any questions, please feel free to ask.

Best regards, PhucNDA.

knsjoon commented 3 months ago

Hello! Thank you for a very fast reply. I truly appreciate it.

Okay, your answer already clears away some confusions. I also understood that by changing config file, I can decide whether to only rely on 2D results and/or 3D results.

model_open3dis

So 1. Extract 2D masks and first stage feature from RGB-D sequences is step 2 in the image. I can run this by using "grounding_2d.sh".

In 2: Generate 3D instances from 2D masks, you generate 3D "clusters" based on 2D image results. I run this by "generate_3d_inst.sh". Q1) Can you guide me how to run the superpoint? I read the paper and you refer to this https://github.com/drprojects/superpoint_transformer but it would be nice if you I can get some more info about this. I would ultimately want to run it on my dataset so I am trying to understand input and output to each pipeline instead of using already provided scannetpp processed data.

In 3-A, you use ISBNet to segment out more objects (step 4 in the image). This, I guess I need to follow ISBNet instruction as I do not find a related config file. But I guess I run with code "python tools/test.py " as mentioned in https://github.com/VinAIResearch/Open3DIS/blob/main/docs/DATA.md.

In 3-B, you merge results from step 2 (segmented objects from 2D) and step 3-A (segmented objects from 3D). This corresponds to the "plus" sign in the image, merging Class-agnostic 3D proposals (4 in the image) and augmented 3D proposals (3 in the image). I think refine_grounding_feat.sh takes care of it.

A lot of questions I know but I would really appreciate a lot if you could help me out here once again! Thank you again for a great work.

PhucNDA commented 3 months ago

Hi @knsjoon

Q1: Yes. calling grounding_2d.sh generates 2D proposals and corresponding first stage pointclould feature.

Q2: You can follow the ScanNet repo with the segmentator.cpp to extract the superpoints from 3D point cloud reconstruction. About k_threshold, we use the default value as same as ScanNet200 (V2). You can adjust this parameter for your custom dataset.

Q3: The configuration file of ISBNet is available here. Before running python tools/test.py, make sure the these directories existed.

Q4: Yes, the 'plus' sign is simple 3D proposal concatenation. Detailed here.

Best, PhucNDA.