RozDavid / LanguageGroundedSemseg

Implementation for ECCV 2022 paper Language-Grounded Indoor 3D Semantic Segmentation in the Wild
98 stars 14 forks source link

2D Visualizations #7

Closed NoOneUST closed 1 year ago

NoOneUST commented 1 year ago

Hello,

It seems the paper only shows the 3D mesh quality. Is it possible to release some 2D semantic maps? By the way, how should we obtain the 2D semantic ground truth for ScanNet 200? In my memory, such data is provided in ScanNet V2.

RozDavid commented 1 year ago

This is a geometry-only segmentation paper, so we neither use 2D images as inputs or 2D segmentations for supervision. If you would like to work with image data for scannet200, you have to parse the raw labels from the provided 2D labels the same way as we do for the mesh annotations.

And you are right, ScanNet does provide images, poses, depth maps and 2d annotations besides the reconstructed meshes, just be sure to check those flags upon downloading the data and after filling the terms of use form.

NoOneUST commented 1 year ago

Do you mean I should revise the code provided at https://github.com/ScanNet/ScanNet/tree/master/BenchmarkScripts/ScanNet200 ?

By the way, is the predicted mesh provided? I supose it may facilitate further model improvement.

One more question. If we want to map the predicted mesh to 2D, is there any tool we can use?

RozDavid commented 1 year ago

1) You dont have to revise the mapping scripts, just have to apply them to the label images the same way as we are mapping the raw point labels to ScanNet200 labels.

2) I dont understand what you mean by using meshes to improve model learnings, but the annotated meshes are indeed provided. You just have to apply the vertex annotations to the faces of the mesh yourself if you would like to process the data differently.

3) You mean to render out the predictions to views with poses? Sure, there are a couple of tools you could use depending on the requirements and available resources. There is a nice implementation from Trimesh for CPU processing or a PyTorch3D for GPU rasterization just to name a few.

Just to clear this up, we are predicting and evaluating on voxels, but for evaluation only maybe the easiest thing to do is to assign every vertex the label from the closest voxel with some sort of KNN search, which you can parse into a mesh (this is what we did for the visuals in the paper actually).

618QRC commented 1 year ago

Hello David,

Thanks for your great work. As your abovementioned, ScanNet does provide images, poses, depth maps, and 2d annotations beside the reconstructed meshes, I wonder how to get the 2D semantic ground truth for the new ScanNet 200 dataset. Because we can only download the 2D semantic ground truth for the original ScanNet V2, there are no existing 2D semantic ground truths for the new ScanNet 200.

RozDavid commented 1 year ago

Hey,

So just to reiterate, the dataset is the same with ScanNet and ScanNet200, only the benchmark is different. The label images which you can (or already have) downloaded containing the raw label ids, can be either parsed to the 20 or the 200 category settings. Just follow the same mapping function that we also used to get the 200 labels from the raw ids similar to this.

Let me know if this is still confusing, I can also write and upload a script here working with images in that case.

618QRC commented 1 year ago

Hey, Thanks for your explanation. There are no more questions. BTW, an official script working with images may fascinate the research on ScanNet 200.