ayushjain1144 / odin

Code for the paper: "ODIN: A Single Model for 2D and 3D Segmentation" (CVPR 2024)
https://odin-seg.github.io
MIT License
113 stars 10 forks source link

Pred_mask, inference model (ScanNet val (Swin-B) , ScanNet val (ResNet50) segmentation task) #19

Closed NikitaVasilevN closed 1 month ago

NikitaVasilevN commented 2 months ago

Dear authors, thank you for your work.

I try to reproduce your results based on models ScanNet val (Swin-B) and ScanNet val (ResNet50) for semantic segmenetation task. Unfortunatelly, I during and after the inference (--eval-only) I can't get similarly IoU values. Can you explain why I get negative pred_masks

'pred_masks': tensor([[[-42.7500, -41.3750, -44.3750, ..., -36.5000, -36.4375, -34.8750], [-23.2344, -22.0312, -22.6250, ..., -17.9219, -17.4688, -17.5156], [-52.1250, -52.4688, -52.8438, ..., -51.2812, -50.3125, -53.9688], ..., [-39.7500, -38.7188, -39.5312, ..., -28.5156, -27.5781, -29.8438], [-39.5000, -39.0625, -39.4375, ..., -32.8438, -31.7500, -33.3438], [-67.1875, -67.6875, -71.9375, ..., -44.3125, -43.7812, -45.2500]]], device='cuda:0', dtype=torch.float16),

Additionally, all pred_scores are equal 0.

'pred_scores': tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], device='cuda:0')}

Also, what does it mean in scannet_swin.sh - DATASETS.TRAIN "('scannet_context_instance_train_20cls_single_highres_100k',)" \ DATASETS.TEST "('scannet_context_instance_val_20cls_single_highres_100k','scannet_context_instance_train_eval_20cls_single_highres_100k',)" espesially this naming scannet_context_instance_train_20cls_single_highres_100k and similarly in other files.

Thanks in advance

ayushjain1144 commented 2 months ago

on top of my head, it looks like there might be some issue with loading checkpoint. Could you send me the wandb link you get when running the code? (please make sure that it's public, and works for you in incognito mode)

In terms of naming, the key parts are dataset_names like "scannet" and dataset split like "train", "val", "train_eval" (random 10 scenes from training set to get training accuracy), "debug" (two scenes for debugging). Rest of the naming had legacy meaning for us during development, but is irrelevant for you.

NikitaVasilevN commented 2 months ago

I see, thank you for your support. It was a problem with path in scannet_swin.sh. I have changed model from m2f_coco_swing.pkl to one of models from Model Zoo (Scannet val (Swain-B) for segmentation task in my case). Anyway, It's really hard to understand where I need to change paths and parameters in config files, .sh files and .py files due to poor description in repo and config files.

ayushjain1144 commented 2 months ago

Hi, Thank you for your feedback and glad you resolved the issue.

If you would have a minute, could you tell me more about the things that were hard? I will try to push a fix to improve the documentation. (I pushed a recent commit to make some changes, but not sure that was the only concern you have). Thanks a lot!

NikitaVasilevN commented 1 month ago

Yes, of course. The structure of the files .yaml causes confusion and in .sh files some arguments are not clear.

Anyway, I also have a question about rendered 2D images from camera poses. Is it possible to get 2D images render from camera poses via your code?

ayushjain1144 commented 1 month ago

render_scannet.txt

not in the released code, but maybe the above file helps -- you might need to change some import locations but it should be fairly self-explanatory