facebookresearch / Mask2Former

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
MIT License
2.56k stars 385 forks source link

Error While using --eval-only options #110

Open KMuzzi opened 2 years ago

KMuzzi commented 2 years ago

Hello! i have truble while re-evaluating trained model.

if i reload saved weights to evaluate it using --eval-only option after training them by train_net_video.py, then error occured because the parameter name is different.

how can i solve this problem?

i captured error log.

[06/21 22:58:03] fvcore.common.checkpoint WARNING: Some model parameters or buffers are not found in the checkpoint: sem_seg_head.pixel_decoder.adapter_1.norm.{bias, weight} sem_seg_head.pixel_decoder.adapter_1.weight sem_seg_head.pixel_decoder.input_proj.0.0.{bias, weight} ...

[06/21 22:58:03] fvcore.common.checkpoint WARNING: The checkpoint state_dict contains keys that are not used by the model: sem_seg_head.pixel_decoder.pixel_decoder.adapter_1.norm.{bias, weight} sem_seg_head.pixel_decoder.pixel_decoder.adapter_1.weight sem_seg_head.pixel_decoder.pixel_decoder.input_proj.0.0.{bias, weight} ...

Jana-Z commented 2 years ago

+1 I can run --eval-only on the .pkl files from the MODEL_ZOO.md. But if I train my own model it saves the weights to a .pth file and when trying to evaluate using these weights I get the following warning messages:

WARNING [07/16 15:49:00 mask2former.modeling.meta_arch.mask_former_head]: Weight format of MaskFormerHead have changed! Please upgrade your models. Applying automatic conversion now ...
WARNING [07/16 15:49:00 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
sem_seg_head.pixel_decoder.adapter_1.norm.{bias, weight}
sem_seg_head.pixel_decoder.adapter_1.weight
[...]
WARNING [07/16 15:49:00 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
  sem_seg_head.pixel_decoder.pixel_decoder.adapter_1.norm.{bias, weight}
  sem_seg_head.pixel_decoder.pixel_decoder.adapter_1.weight

Moreover, trying to load in the saved model_final.pth in a seperate jupyter notebook causes a PytorchStreamReader failed reading zip archive: failed finding central directory error.

Same error as #95

GuHuangAI commented 2 years ago

Have you solved this problem? I meet the same, could you please give some suggestion?

sushilkhadkaanon commented 1 year ago

@KMuzzi @Jana-Z @GuHuangAI You guys solved the issue? I'm getting DefaultCPUAllocator: can't allocate memory: you tried to allocate 3077222400 bytes. Error code 12 (Cannot allocate memory). Could you please share your config file?

ShijieVVu commented 7 months ago

It's likely the pth file is botched. Maybe the weight isn't fully saved, try a prior checkpoint.