Jumpat / SegmentAnythingin3D

Segment Anything in 3D with NeRFs (NeurIPS 2023)
Apache License 2.0
873 stars 54 forks source link

How much memory is required to train SA3D, and how to make own dataset? #36

Closed Lizhinwafu closed 10 months ago

Lizhinwafu commented 11 months ago

How much memory is required to train SA3D, and how to make own dataset?

Jumpat commented 11 months ago

The GPU memory cost depends on the scene. A 24GB (e.g., RTX3090) GPU is usually enough. For training on custom dataset, you may find some help from DirectVoxGo. You need to process the data with COLMAP and put it under the specific data structure.

Lizhinwafu commented 10 months ago

Thanks for your reply.

I have some question about SA3D:

(1) I found that the data resolution was too large and the training was easily interrupted. My data is 4096*2160. (2) I found the reason for the error. LLFF is a forward-facing data set, and I collected data through 360 degrees. I made it into LLFF data set format and found that it cannot be trained. Maybe it needs to be made into LERF format or mip-NeRF360. (3) I found that you have implemented instance splitting for multiple targets. How is this achieved and which command is used? In the example command, I can only get one target in the scene. This instance segmentation is what I want.

Many thanks. image

Lizhinwafu commented 10 months ago

Another question, should these two places in the nerf_unbounded data set be the same? image image

Zanue commented 10 months ago

(1) To use low resolution images for training, you need to firstly ensure that in your data directory there are folders like [images|images_2|images_4|images_8]. This can be created by COLMAP. Then you need to specify the factor parameter in your configs, like this. factor means the downsample factor of training images.

(2)

Maybe it needs to be made into LERF format or mip-NeRF360.

This is right. LLFF scenes are forward-facing, while LERF and mip-NeRF360 scenes are 360°. Therefore, the LLFF format data are trained with DVGO with ndc processing, and LERF and mip-NeRF360 data are trained with DCVGO without ndc.

(3) We have not published this part of code yet. It can be implemented by setting the class of seg_mask_grid to be n+1(n is the object number and 1 represents the unknown class), and modifying the training process and loss function of the code correspondingly.

(4)

Another question, should these two places in the nerf_unbounded data set be the same?

Yes, they should be.

Lizhinwafu commented 10 months ago

When I train a NeRF model (360 degree acquisition), it gives the following error. image