Open yuanczx opened 1 month ago
Could you try by editing mmseg/models/segmentors/base.py
line 87 like this?
# shape
for img_meta in img_metas:
ori_shapes = [_['ori_shape'] for _ in img_meta]
# ori_shapes = [_[0]['ori_shape'] for _ in img_meta._data]
assert all(shape == ori_shapes[0] for shape in ori_shapes)
img_shapes = [_['img_shape'] for _ in img_meta]
# img_shapes = [_[0]['img_shape'] for _ in img_meta._data]
assert all(shape == img_shapes[0] for shape in img_shapes)
pad_shapes = [_['pad_shape'] for _ in img_meta]
# pad_shapes = [_[0]['pad_shape'] for _ in img_meta._data]
assert all(shape == pad_shapes[0] for shape in pad_shapes)
if num_augs == 1:
return self.simple_test(imgs[0], img_metas[0], **kwargs)
# return self.simple_test(imgs[0], img_metas[0]._data[0], **kwargs)
I get another error
load checkpoint from local path: work_dirs_d/tqdm_clip_vit-l_1e-5_20k-g2c-512/iter_20000.pth
not distributed
[ ] 0/500, elapsed: 0s, ETA:Traceback (most recent call last):
File "/mnt/csip-090/tqdm/test.py", line 193, in <module>
main()
File "/mnt/csip-090/tqdm/test.py", line 170, in main
outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
File "/mnt/csip-090/tqdm/mmseg/apis/test.py", line 65, in single_gpu_test
result = model(return_loss=False, **data)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/mmcv/parallel/data_parallel.py", line 50, in forward
return super().forward(*inputs, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func
return old_func(*args, **kwargs)
File "/mnt/csip-090/tqdm/mmseg/models/segmentors/base.py", line 117, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/mnt/csip-090/tqdm/mmseg/models/segmentors/base.py", line 98, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/mnt/csip-090/tqdm/models/segmentors/tqdm_clip.py", line 272, in simple_test
seg_logit = self.inference(img, img_meta, rescale)
File "/mnt/csip-090/tqdm/models/segmentors/tqdm_clip.py", line 255, in inference
seg_logit = self.slide_inference(img, img_meta, rescale)
File "/mnt/csip-090/tqdm/models/segmentors/tqdm_clip.py", line 194, in slide_inference
crop_seg_logit = self.encode_decode(crop_img, img_meta)
File "/mnt/csip-090/tqdm/models/segmentors/tqdm_clip.py", line 156, in encode_decode
x = self.extract_feat(img)
File "/mnt/csip-090/tqdm/models/segmentors/tqdm_clip.py", line 96, in extract_feat
x = self.backbone(img)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/csip-090/tqdm/models/backbones/clip/models.py", line 227, in forward
x = self.conv1(x)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: FIND was unable to find an engine to execute this computation
I think this is pytorch and cuda version issue. Please reinstall pytorch and cuda according to your environment. If you let me know your environment, I can help you.
And I think you need to run the training code with the following bash script.
bash dist_train.sh configs/tqdm/tqdm_clip_vit-l_1e-5_20k-g2c-512.py 1
I think this is pytorch and cuda version issue. Please reinstall pytorch and cuda according to your environment. If you let me know your environment, I can help you.
Thank you. This is my environment.
# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
# pip show torch
Name: torch
Version: 2.0.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /opt/conda/lib/python3.10/site-packages
Requires: filelock, jinja2, networkx, sympy, typing-extensions
Required-by: timm, torchaudio, torchdata, torchelastic, torchtext, torchvision, triton, xformers
Please try:
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
Please try:
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
I have run this command but it doesn't work. whether or not mmseg/models/segmentors/base.py
has been changed
And I think you need to run the training code with the following bash script.
bash dist_train.sh configs/tqdm/tqdm_clip_vit-l_1e-5_20k-g2c-512.py 1
Thank you. I'll try it.
I use
python train.py configs/tqdm/tqdm_clip_vit-l_1e-5_20k-g2c-512.py
to train the model but an error occurs during evaluation.Error Traceback:
I try to test the model use
python test.py configs/tqdm/tqdm_clip_vit-l_1e-5_20k-g2c-512.py work_dirs_d/tqdm_clip_vit-l_1e-5_20k-g2c-512/iter_20000.pth --eval mIoU
then get the same error.