dongbo811 / AFFormer

119 stars 12 forks source link

mmseg - WARNING - The model and loaded state dict do not match exactly #15

Open OpenAI-chn opened 1 year ago

OpenAI-chn commented 1 year ago

mmseg - WARNING - The model and loaded state dict do not match exactly size mismatch for stem.0.conv.weight: copying a param with shape torch.Size([32, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 3, 3, 3]). size mismatch for stem.0.bn.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]). size mismatch for stem.0.bn.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]). size mismatch for stem.0.bn.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]). 好像是预训练权重文件 和模型不匹配,请问这个怎么修改呢?

dongbo1998 commented 1 year ago

Please modify the path of the corresponding pre-training weights in the config.

dongbo1998 commented 1 year ago

Sorry, I have re-updated the pre-training weights. Please download and try again. If you have more questions, please contact me.

OpenAI-chn commented 11 months ago

Sorry, I have re-updated the pre-training weights. Please download and try again. If you have more questions, please contact me.

Hello, thank you very much for your help. The pre-training weight file this time did not throw any errors and effectively improved the performance on my own dataset B. Additionally, I'd like to ask you a question: After training Afformer-base with my self-constructed dataset A and using it as pre-training weights, why is there no significant improvement when fine-tuning on the small-scale dataset B?

OpenAI-chn commented 11 months ago

My dataset A has around 20,000 images, while dataset B has a few hundred images. Is it because my dataset A is too small in scale? Were your pre-training weight files trained on ImageNet?

agfwhf commented 1 month ago

请问这个问题怎么解决 Traceback (most recent call last): File "tools/train.py", line 250, in main() File "tools/train.py", line 239, in main train_segmentor( File "/home/AFFormer/mmseg/apis/train.py", line 178, in train_segmentor runner.run(data_loaders, cfg.workflow) File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/iter_based_runner.py", line 134, in run iter_runner(iter_loaders[i], *kwargs) File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/iter_based_runner.py", line 59, in train data_batch = next(data_loader) File "/usr/local/lib/python3.8/dist-packages/mmcv/runner/iter_based_runner.py", line 39, in next data = next(self.iter_loader) File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1176, in _next_data raise StopIteration StopIteration ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 25137) of binary: /usr/bin/python3.8 Traceback (most recent call last): File "/usr/local/bin/torchrun", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper return f(args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 719, in main run(args) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 710, in run elastic_launch( File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 259, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: