AttributeError in Model Testing

yuanczx commented 1 month ago

I use python train.py configs/tqdm/tqdm_clip_vit-l_1e-5_20k-g2c-512.py to train the model but an error occurs during evaluation.

Error Traceback:

[                                                  ] 0/500, elapsed: 0s, ETA:Traceback (most recent call last):
  File "/mnt/csip-090/tqdm/train.py", line 201, in <module>
    main()
  File "/mnt/csip-090/tqdm/train.py", line 190, in main
    train_segmentor(
  File "/mnt/csip-090/tqdm/mmseg/apis/train.py", line 135, in train_segmentor
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.10/site-packages/mmcv/runner/iter_based_runner.py", line 138, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/mmcv/runner/iter_based_runner.py", line 68, in train
    self.call_hook('after_train_iter')
  File "/opt/conda/lib/python3.10/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/opt/conda/lib/python3.10/site-packages/mmcv/runner/hooks/evaluation.py", line 262, in after_train_iter
    self._do_evaluate(runner)
  File "/mnt/csip-090/tqdm/mmseg/core/evaluation/eval_hooks.py", line 36, in _do_evaluate
    results = single_gpu_test(
  File "/mnt/csip-090/tqdm/mmseg/apis/test.py", line 65, in single_gpu_test
    result = model(return_loss=False, **data)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/mmcv/parallel/data_parallel.py", line 50, in forward
    return super().forward(*inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
    return old_func(*args, **kwargs)
  File "/mnt/csip-090/tqdm/mmseg/models/segmentors/base.py", line 117, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/mnt/csip-090/tqdm/mmseg/models/segmentors/base.py", line 89, in forward_test
    ori_shapes = [_[0]['ori_shape'] for _ in img_meta._data]
AttributeError: 'list' object has no attribute '_data'

I try to test the model use python test.py configs/tqdm/tqdm_clip_vit-l_1e-5_20k-g2c-512.py work_dirs_d/tqdm_clip_vit-l_1e-5_20k-g2c-512/iter_20000.pth --eval mIoU then get the same error.

ByeongHyunPak commented 1 month ago

Could you try by editing mmseg/models/segmentors/base.py line 87 like this?

# shape
for img_meta in img_metas:
    ori_shapes = [_['ori_shape'] for _ in img_meta]
    # ori_shapes = [_[0]['ori_shape'] for _ in img_meta._data]
    assert all(shape == ori_shapes[0] for shape in ori_shapes)
    img_shapes = [_['img_shape'] for _ in img_meta]
    # img_shapes = [_[0]['img_shape'] for _ in img_meta._data]
    assert all(shape == img_shapes[0] for shape in img_shapes)
    pad_shapes = [_['pad_shape'] for _ in img_meta]
    # pad_shapes = [_[0]['pad_shape'] for _ in img_meta._data]
    assert all(shape == pad_shapes[0] for shape in pad_shapes)
if num_augs == 1:
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
    # return self.simple_test(imgs[0], img_metas[0]._data[0], **kwargs)

yuanczx commented 1 month ago

I get another error

load checkpoint from local path: work_dirs_d/tqdm_clip_vit-l_1e-5_20k-g2c-512/iter_20000.pth
not distributed
[                                                  ] 0/500, elapsed: 0s, ETA:Traceback (most recent call last):
  File "/mnt/csip-090/tqdm/test.py", line 193, in <module>
    main()
  File "/mnt/csip-090/tqdm/test.py", line 170, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
  File "/mnt/csip-090/tqdm/mmseg/apis/test.py", line 65, in single_gpu_test
    result = model(return_loss=False, **data)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/mmcv/parallel/data_parallel.py", line 50, in forward
    return super().forward(*inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
    return old_func(*args, **kwargs)
  File "/mnt/csip-090/tqdm/mmseg/models/segmentors/base.py", line 117, in forward
    return self.forward_test(img, img_metas, **kwargs)
  File "/mnt/csip-090/tqdm/mmseg/models/segmentors/base.py", line 98, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/mnt/csip-090/tqdm/models/segmentors/tqdm_clip.py", line 272, in simple_test
    seg_logit = self.inference(img, img_meta, rescale)
  File "/mnt/csip-090/tqdm/models/segmentors/tqdm_clip.py", line 255, in inference
    seg_logit = self.slide_inference(img, img_meta, rescale)
  File "/mnt/csip-090/tqdm/models/segmentors/tqdm_clip.py", line 194, in slide_inference
    crop_seg_logit = self.encode_decode(crop_img, img_meta)
  File "/mnt/csip-090/tqdm/models/segmentors/tqdm_clip.py", line 156, in encode_decode
    x = self.extract_feat(img)
  File "/mnt/csip-090/tqdm/models/segmentors/tqdm_clip.py", line 96, in extract_feat
    x = self.backbone(img)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/csip-090/tqdm/models/backbones/clip/models.py", line 227, in forward
    x = self.conv1(x)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: FIND was unable to find an engine to execute this computation

ByeongHyunPak commented 1 month ago

I think this is pytorch and cuda version issue. Please reinstall pytorch and cuda according to your environment. If you let me know your environment, I can help you.

ByeongHyunPak commented 1 month ago

And I think you need to run the training code with the following bash script.

bash dist_train.sh configs/tqdm/tqdm_clip_vit-l_1e-5_20k-g2c-512.py 1

yuanczx commented 1 month ago

I think this is pytorch and cuda version issue. Please reinstall pytorch and cuda according to your environment. If you let me know your environment, I can help you.

Thank you. This is my environment.

# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

# pip show torch
Name: torch
Version: 2.0.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /opt/conda/lib/python3.10/site-packages
Requires: filelock, jinja2, networkx, sympy, typing-extensions
Required-by: timm, torchaudio, torchdata, torchelastic, torchtext, torchvision, triton, xformers

ByeongHyunPak commented 1 month ago

Please try:

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

yuanczx commented 1 month ago

Please try:

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

I have run this command but it doesn't work. whether or not mmseg/models/segmentors/base.py has been changed

yuanczx commented 1 month ago

And I think you need to run the training code with the following bash script.
bash dist_train.sh configs/tqdm/tqdm_clip_vit-l_1e-5_20k-g2c-512.py 1

Thank you. I'll try it.

ByeongHyunPak / tqdm

AttributeError in Model Testing #2