drprojects / DeepViewAgg

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Other
222 stars 24 forks source link

How to test custom data #36

Closed mostafa501 closed 1 year ago

mostafa501 commented 1 year ago

Hi, congratulations for the great job, even the algorithm or the paper. I want to ask about evaluation of my collected data for semantic segmenation (S3DIS): 1- is the point clouds have (x,y,z) format only for the code to work or can have xyzrgb format ? 2- what is min number of images used in one scene test, how to make co registration to images? 3-i did not find steps how to evaluate the collected data, can you please illustrate to me?

I hope you help me, thank you.

drprojects commented 1 year ago

Hi @mostafa501

1- is the point clouds have (x,y,z) format only for the code to work or can have xyzrgb format ?

You can pass RGB to the points too if you'd like, but if you are using a model pretrained with XYZ only, it won't work. Theoretically, if you want to train your own model with XYZGRB point features in addition to the images, you can. This will probably not improve your results though. Our paper and investigations showed that extracting features from multi-view images is more powerful than using color information on the points. If you think about it, the point colorization has been projected from multi-view images, with human engineered heuristics. Our method learns to do exactly that, but it learns to project high-level features instead of just RGB information, and it learns to aggregate information from multiple views of each point, based on their observation conditions, instead of heuristics.

2- what is min number of images used in one scene test, how to make co registration to images?

There is no a priori minimum nor maximum number of images. Still, the model may struggle in areas where points are seen by no image (in this case it has no radiometric information and must predict semantics solely based on the geometry). So you may want to have a good coverage of your seen. Have a look at our paper for an ablation study on the impact of the number of views.

3-i did not find steps how to evaluate the collected data, can you please illustrate to me?

The provided codebase lets your train and evaluate on benchmark datasets. If you want to work with another dataset, you will have to do the coding yourself.