Open draym28 opened 7 months ago
Hi,
The version of the code in this repo is very outdated and is indeed not up-to-date with current CUDA standards. I fixed all of those issues in my implementation, you can probably copy the csrc
folder into your local path and be able to compile without any issues (I tested it with CUDA version 11+):
https://github.com/Maelic/SGG-Benchmark/tree/main/sgg_benchmark/csrc
Best
Hi,
The version of the code in this repo is very outdated and is indeed not up-to-date with current CUDA standards. I fixed all of those issues in my implementation, you can probably copy the
csrc
folder into your local path and be able to compile without any issues (I tested it with CUDA version 11+): https://github.com/Maelic/SGG-Benchmark/tree/main/sgg_benchmark/csrcBest
Thanks for your help!
But after using your csrc
, when I conduct SGDet on Custom Images following the instruction in README.md
, other errors still comes up:
D:\App\Anaconda3\envs\sgg\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: 'cp1' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
D:\App\Anaconda3\envs\sgg\lib\site-packages\apex\__init__.py:68: DeprecatedFeatureWarning: apex.amp is deprecated and will be removed by the end of February 2023. Use [PyTorch AMP](https://pytorch.org/docs/stable/amp.html)
warnings.warn(msg, DeprecatedFeatureWarning)
Traceback (most recent call last):
File "tools/relation_test_net.py", line 11, in <module>
from maskrcnn_benchmark.data import make_data_loader
File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\__init__.py", line 2, in <module>
from .build import make_data_loader, get_dataset_statistics
File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\build.py", line 14, in <module>
from . import datasets as D
File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\datasets\__init__.py", line 2, in <module>
from .coco import COCODataset
File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\datasets\coco.py", line 39, in <module>
class COCODataset(torchvision.datasets.coco.CocoDetection):
AttributeError: module 'torchvision' has no attribute 'datasets'
I still stuck on this step. It makes me crazy.
Which version of torchvision are you using?
It works for me with torchvision 0.17 for cuda 12.1
I am using pytorch=1.13
and torchvision=0.14
.
I can import torchvision.datasets
as you did, but when I run the scripts to conduct sgdet on custom images, the error came up.
it is confused.
Then you may be running your code in another conda env or something like that. You can also try to clean and re-build the package with something like rm -rf ./build/ && python setup.py build develop
I clean and create a new env many times.
But the error still come up.
And I also did python setup.py build develop
every time.
Many people also have this problem, see this.
Can you post the outputs of pip freeze | grep torchvision
and conda list | grep torchvision
? You may have different versions of torchvision installed at the same time.
outputs of pip freeze | grep torchvision
:
torchvision==0.14.1
outputs of conda list | grep torchvision
:
torchvision 0.14.1 py38_cu117 pytorch
Hum I don't know, from your outputs I assume that you installed torchvision with conda, try removing it and install with pip maybe. On my machine, I installed it with the following command (for cuda 12.1):
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121
Still don't work.
This time I create a new env and use pip install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 --index-url https://download.pytorch.org/whl/cu117
.
But the error still come up.
I'm afraid I can't help you more here, sorry. I don't recall having this error ever, even when I was working with previous versions of pytorch for this codebase.
It is OK, thanks for your help. I will keep finding the solution.
Hi @Maelic, thank you for sharing your implementation. I'm encountering an issue with installing Apex due to CUDA compatibility. I was wondering if you could provide guidance on how to resolve this. Thanks!
Hi @Maelic, thank you for sharing your implementation. I'm encountering an issue with installing Apex due to CUDA compatibility. I was wondering if you could provide guidance on how to resolve this. Thanks!
You don't need to use APEX anymore as it is depreciated and built-in for new versions of torch. Please consider removing all reference to apex and this line https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/4b6b71a90d4198d9dae574d42b062a5e534da291/tools/relation_train_net.py#L159
And add this a little above:
with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=use_amp):
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())
And it should work, see:
Thank you for the prompt response. In the step-by-step installation (https://github.com/Maelic/SGG-Benchmark/blob/main/INSTALL.md) I have an error. My CUDA version is 11.5 but 11.5 is not available in the nvidia channels. How can I solve this issue?
RuntimeError: The detected CUDA version (11.5) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions.
Thank you for the prompt response. In the step-by-step installation (https://github.com/Maelic/SGG-Benchmark/blob/main/INSTALL.md) I have an error. My CUDA version is 11.5 but 11.5 is not available in the nvidia channels. How can I solve this issue?
RuntimeError: The detected CUDA version (11.5) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions.
Try upgrading your CUDA version or build torch from source. By the way, this is not an issue directly related to this work, you will probably have more success if you ask on the dedicated PyTorch forum.
Some common problems & solutions when installing maskrcnn_benchmark.
1. THC.h: No such file or directory/THCeilDiv Undefined/ see this
2. identifier "THCudaCheck" is undefined see this
3.
torch.utils.cpp_extension.load
stuck see this