How can I get the model weights file defined in a config file with the key MODEL.WEIGHTS when I train OneFormer and set resume=False?

SHI-Labs / OneFormer

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023

https://praeclarumjj3.github.io/oneformer

MIT License

1.39k stars 129 forks source link

How can I get the model weights file defined in a config file with the key MODEL.WEIGHTS when I train OneFormer and set resume=False? #71

Closed qgq99 closed 1 year ago

qgq99 commented 1 year ago

Appreciate your effort in research work of OneFormer. I got this error when I started an independant training:

I find it occurs because of the call of the method trainer.resume_or_load(resume=args.resume), which is located at line 424 of file train_net.py. It will try to load checkpoint from the MODEL.WEIGHTS of the config file. And I don't konw how to get it, could someone help me？ This may be a low-level mistake, I'm a student, please don't take offense.

praeclarumjj3 commented 1 year ago

Hi, @qgq99, thanks for your interest in our work. Happy to answer your question, nothing offensive.

Could you share your training command? When you execute the command, you may pass a custom path for the pretrained weights using something like the command below for Swin-L OneFormer:

python train_net.py --dist-url 'tcp://127.0.0.1:50163' \
    --num-gpus 8 \
    --config-file configs/ade20k/swin/oneformer_swin_large_bs16_160k.yaml \
    MODEL.WEIGHTS <PATH-TO-CHECKPOINT-HERE> \
    OUTPUT_DIR outputs/ade20k_swin_large WANDB.NAME ade20k_swin_large

You can get the pretrained weights using the instructions here.

qgq99 commented 1 year ago

Hi, @praeclarumjj3, thank you for your reply! The training command I used is in the same format as shown in the file GETTING_STARTED.md, specifically as follows:

python train_net.py --dist-url 'tcp://127.0.0.1:50163' \
    --num-gpus 2 \
    --config-file configs/ade20k/oneformer_swin_tiny_bs16_160k.yaml \
    OUTPUT_DIR outputs/ade20k_swin_large WANDB.NAME ade20k_swin_tiny

praeclarumjj3 commented 1 year ago

@qgq99, you need to specify MODE.WEIGHTS, as I suggested in my previous comment. Please try the following command:

python train_net.py --dist-url 'tcp://127.0.0.1:50163' \
    --num-gpus 2 \
    --config-file configs/ade20k/oneformer_swin_tiny_bs16_160k.yaml \
    MODEL.WEIGHTS <PATH-TO-CHECKPOINT-HERE> \
    OUTPUT_DIR outputs/ade20k_swin_large WANDB.NAME ade20k_swin_tiny

qgq99 commented 1 year ago

@praeclarumjj3 I have successfully start training. Thanks a lot again for your help and this excellent work!