How to run this on custom dataset?

Monalsingh commented 1 year ago

I have annotated few samples and trying to generate coco mask for the same.

I have changed values in pl_data_module.py to point the custom annotation.json and images path.

I am using this command to run the inference [Phase 1 only]

python main.py --resume /home/vit-mae-base_coco-final.ckpt --label_dump_path /home --not_eval_mask --box_inputs /home/train/_annotations.coco.json --val_only

I am getting this error:

Traceback (most recent call last):
  File "main.py", line 179, in <module>
    trainer.validate(model, ckpt_path=args.resume, dataloaders=data_loader.val_dataloader())
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 665, in validate
    return call._call_and_handle_interrupt(
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 36, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 90, in launch
    return function(*args, **kwargs)  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 714, in _validate_impl
    results = self._run(model, ckpt_path=self.ckpt_path)  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1005, in _run    self._restore_modules_and_callbacks(ckpt_path)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 961, in _restore_modules_and_callbacks
    self._checkpoint_connector.restore_model()
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 271, in restore_model    self.trainer.strategy.load_model_state_dict(self._loaded_checkpoint)
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 363, in load_model_state_dict    self.lightning_module.load_state_dict(checkpoint["state_dict"])  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1667, in load_state_dict    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for MALPseudoLabels:        Unexpected key(s) in state_dict: "student.roi_head.mlp2_cls.weight", "student.roi_head.mlp2_cls.bias", "teacher.roi_head.mlp2_cls.weight", "teacher.roi_head.mlp2_cls.bias".

Thanks in advance.

voidrank commented 1 year ago

Hi @Monalsingh

It is caused by mismatched weights. I will update the weights and test them again. Just give me a few hours to fix it.

Best,

WeiChihChern commented 1 year ago

@voidrank Looking forward to the update!

Best,

Monalsingh commented 1 year ago

@voidrank Looking forward to run some inference on the final model. Please update the links once you are done testing.

Thanks,

voidrank commented 1 year ago

Sorry, guys. I was super busy with ICML submissions. I will update it when I have time.

ameyparanjape commented 1 year ago

@voidrank are the weight links updated now?

voidrank commented 1 year ago

@ameyparanjape, I almost forgot this. I will work on this later this week. Thanks for reminding me of this issue.

tianyufang1958 commented 1 year ago

@ameyparanjape, I almost forgot this. I will work on this later this week. Thanks for reminding me of this issue.

Sorry but I still have the same error when using the pre-trained model.

ameyparanjape commented 1 year ago

@voidrank will these be updated before the TAO 5.0 release?

voidrank commented 1 year ago

@ameyparanjape well, unfortunately, no. Sorry about that, but I believe TAO 5.0 will be released very soon.

voidrank commented 1 year ago

Hi @Monalsingh @WeiChihChern @ameyparanjape

Can you try this link? It should work.

https://drive.google.com/file/d/1H952BJWS3QtslG3TqyS8kXJl-Zx3AI8H/view?usp=share_link

cy810557 commented 1 year ago

Hi, @voidrank

Thanks for updating, this new link works. But when I evaluated on COCO val set, this ckpt gave a low mIoU( See logs below). Should there be any changes when running with this ckpt?

Validation DataLoader 0: 100%|███████████████████████████████████████████████████████████████| 1313/1313 [03:29<00:00, 6.27it/s]val/mIoU: 0.3425881266593933 val/mIoU_things tensor(0.3570, device='cuda:0') val/mIoU_semistuff tensor(0.2514, device='cuda:0') /opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: The compute method of metric MIoUMetrics was called before the update method which may lead to errors, as metric states have not yet been updated. warnings.warn(*args, **kwargs) val/mIoU_small: 0.3508264720439911 val/mIoU_medium: 0.30977723002433777 val/mIoU_large: 0.31840550899505615 Validation DataLoader 0: 100%|███████████████████████████████████████████████████████████████| 1313/1313 [03:33<00:00, 6.16it/s] ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Validate metric DataLoader 0 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── val/mIoU 0.3425881564617157 val/mIoU_large 0.31840550899505615 val/mIoU_medium 0.30977723002433777 val/mIoU_small 0.3508264720439911 val/mIoU_stuff 0.24615158140659332 val/mIoU_things 0.35705024003982544

As a comparison, here is metric calculated with my ckpt trained for 6 epochs:

Validation DataLoader 0: 100%|███████████████████████████████████████████████████████████████| 1313/1313 [03:06<00:00, 7.02it/s]val/mIoU: 0.7872997522354126 val/mIoU_things tensor(0.7871, device='cuda:0') val/mIoU_semistuff tensor(0.7988, device='cuda:0') /opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: The compute method of metric MIoUMetrics was called before the update method which may lead to errors, as metric states have not yet been updated. warnings.warn(*args, **kwargs) val/mIoU_small: 0.7310887575149536 val/mIoU_medium: 0.7878617644309998 val/mIoU_large: 0.7925001382827759 Validation DataLoader 0: 100%|███████████████████████████████████████████████████████████████| 1313/1313 [03:19<00:00, 6.59it/s] ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Validate metric DataLoader 0 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── val/mIoU 0.7872997522354126 val/mIoU_large 0.7925001382827759 val/mIoU_medium 0.7878617644309998 val/mIoU_small 0.7310887575149536 val/mIoU_stuff 0.8053987622261047 val/mIoU_things 0.7857838273048401 ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

voidrank commented 1 year ago

@cy810557 Thanks for the updates!

Not sure what happens here....

Could you share your weights? I would like to replace mine with the one you share here, if you don't mind.

voidrank commented 1 year ago

I'm thinking of putting MAL on huggingface. Will that be helpful for you guys? @ameyparanjape @Monalsingh

cy810557 commented 1 year ago

@cy810557 Thanks for the updates!

Not sure what happens here....

Could you share your weights? I would like to replace mine with the one you share here, if you don't mind.

Here is my ckpt, hope it helps. https://drive.google.com/file/d/1VEhlZV-McaizuPBQxNj6x-Ip0xemeGdB/view?usp=sharing

voidrank commented 1 year ago

Updated the weights in README. Thanks @cy810557 !

NVlabs / mask-auto-labeler

How to run this on custom dataset? #1