Jumpat / SegmentAnythingin3D

Segment Anything in 3D with NeRFs (NeurIPS 2023)
Apache License 2.0
831 stars 52 forks source link

A couple of questions about nerfstudio-version #65

Closed VolkaJ closed 2 months ago

VolkaJ commented 3 months ago

Hi, thank you for sharing your excellent work. I've successfully installed everything as required (using the nerfstudio-version branch and my own dataset), but I have a few questions:

  1. Is the "ns-train sa3d ~" command meant to be used only within the SegmentAnything3D directory, or is there a way to execute this command globally?
  2. After completing the training, I found the following files and directories: outputs/processing/sa3d/2024-04-03-120202/, along with config.yaml, dataparser_transforms.json, nerfstudio_models, and trans_vis inside. Are these the only outputs generated from the training process? I noticed in Issue #47 that it's possible to obtain 3D segmentation results in a /logs folder, but I'm unable to locate this directory.
  3. Regarding the command below, what does the "--pipeline.network.num_prompts" option do?
ns-train sa3d --data {data-dir} \
  --load-dir {ckpt-dir} \
  --pipeline.text_prompt {text-prompt} \
  --pipeline.network.num_prompts {num-prompts} \

Any assistance you can provide would be greatly appreciated. Thank you.

Zanue commented 3 months ago

Hi,

is there a way to execute this command globally?

Sure, you can conduct pip install -e . under the SegmentAnything3D directory and then you can execute ns-train sa3d ~ globally, as mentioned on this page.

Are these the only outputs generated from the training process?

Currently the nerfstudio-version SA3D does not support generating 3D masks. You can try the original SA3D code which is based on DVGO.

what does the "--pipeline.network.num_prompts" option do?

It is a hyperparameter to control the maximum number of point prompts we use in the stage of self-prompting. When you want to segment an object with a simple shape, setting it to be 3 ~ 5 is ok; when the target object has a complex shape (like fern), you may set it to be larger (10 ~ 20).

VolkaJ commented 3 months ago

Thank you so much for the explanation. I didn't know what num_prompts's for so I set it to 1, that's probably why I got the weird video output. I'll try again.

By the way, I'm currently trying to run the segmentation on the remote server and I can't use GUI because I want to automate the process.

Is there any way I can automate the process only using fixed text prompt and also get the segmentation result? I'm tryting to get pointcloud of multiple segmented objects.

If it's impossible with the current version, I'd very much appreciate any kind of tips or guides. Thank you so much.

Zanue commented 3 months ago
  1. I think you can modify the code to save the point cloud after the training stage, such as saving pointcloud here.

  2. To generate point cloud, you can loop the training set, use the depth (and mask) from the nerf model to calculate target points in the world coordinate system, and finally merge all the points to obtain the point cloud.

  3. Comment out line 275 ~ 278 here to make the code end immediately without ctrl + C.

VolkaJ commented 2 months ago

Thanks for the advice. I'll work on it.

Probably one last question on this thread.

I fine-tuned the pre-trained GroundingDino model, and when I tested it, it worked fine. But when I run 'ns-train sa3d ~' with a fine-tuned GroundingDino model. The GroundingDino gets zero bounding box. (Get box from GroundingDino: []).

I guess the the input for GroundingDino to make initial mask was generated image by Nerf right? Does this mean Nerf wasn't trained very well? I'm having a trouble finding out what might be the problem here.

Zanue commented 2 months ago

the input for GroundingDino to make initial mask was generated image by Nerf right?

Yes, we use the image rendered from Nerf, check this. You can save this image to see what is wrong.

VolkaJ commented 2 months ago

You can save this image to see what is wrong.

Yeah, it's definitely wrong. The image (rendered from Nerf) seems irrelavant. tmp

Can you let me know where 'batch' is from? I wonder how the nerf model rendered that image... hmm this

Zanue commented 2 months ago

Can you let me know where 'batch' is from?

This is the 'batch'.

You might check if your model correctly load the pretrained nerf checkpoint.

VolkaJ commented 2 months ago

Hmm nerf checkpoint was loaded correctly. And I just found out the image saved before 'get_outputs_for_camera_ray_bundle' (here)

imageio.imwrite(f"batch_image_{self.image_count}.png", batch["image"].squeeze().cpu().numpy())

looks totally fine, but the image saved after 'get_outputs_for_camera_ray_bundle' imageio.imwrite(f"model_outputs_{self.image_count}.png", model_outputs["rgb"].squeeze().cpu().numpy())

looks all green.

There might be something wrong with that function?

Zanue commented 2 months ago

get_outputs_for_camera_ray_bundle() receives the camera pose as input and output rgb image, depth, mask, etc.

batch["image"] is the ground truth image, and model_outputs["rgb"] is the image generated by Nerf.

I have already test this nerfstudio-version code and it seems to be ok. I suggest the followings:

  1. Make sure your nerfstudio version is nerfstudio==0.2.0 (Though I have tested under 1.0.2);
  2. Use ns-viewer (like this script) to check if your pretrained nerf model is fine;
  3. Try more views and see if the generated images are all the same.
VolkaJ commented 2 months ago

Thank you for your patience. I've been helped a lot.

  1. Yeah, the version in 0.2.0
  2. I check the model on the viewer, and the pretrained nerf model looked fine.
  3. The grenerated images seem to have a little different r,g,b value, but they are all so much different from the batch["image"]. Aren't they supposed to be similar to each other? (batch["image"], model_outputs["rgb"]) They look basically similar to the one I uploaded above.
Zanue commented 2 months ago

Very strange......

Could you provide more details, like the scripts you have conducted (to train nerf and to segment), and samples of your dataset?

VolkaJ commented 2 months ago

Oh! I just realized that I installed sa3d on nerfstudio=0.2.0 version but I trained the nerf model on nerfstudio=1.0.0 docker image. I'll come back after finishing training on 0.2.0. Hopefully this solved the issue.

VolkaJ commented 2 months ago

Turns out it was the consistency of the nerfstudio version problem. Thanks a lot for helping me find out the problem.

Still having a problem with my own dataset, but I think I'm almost there.

Probably an obvious and self-explanatory question but, Is GroundingDino supposed to find the bounding box for all input images? If so, there shouldn't be any images that don't have the object that I want to segment in a nerf-training stage, right?

Zanue commented 2 months ago

We only use GroundingDino to find the mask in the first input image. After that, the SAM model will be used to automatically find the target object in the training set images and complete the segmentation. Therefore, you do not need to worry when there are images without the object you want.

VolkaJ commented 2 months ago
  1. Oh, then that could be a serious problem cause it seems like the input camera pose is decided by a certain logic of camera optimizer. So I can't guarantee if a single view of my training dataset doesn't contain the target object, the program die right? Do you think I understand it right?

  2. I tested two different datasets but the very first model_output['rgb'] always looks sparse. Except for the first one, the rest of them looked fine. Can you guess why it is? [Edited] First 8-10 images look bad and it gets better.. Gotta look into what it's about..