Error when installing maskrcnn_benchmark

draym28 commented 7 months ago

Some common problems & solutions when installing maskrcnn_benchmark.

1. THC.h: No such file or directory/THCeilDiv Undefined/ see this

2. identifier "THCudaCheck" is undefined see this

3. torch.utils.cpp_extension.load stuck see this

Maelic commented 7 months ago

Hi,

The version of the code in this repo is very outdated and is indeed not up-to-date with current CUDA standards. I fixed all of those issues in my implementation, you can probably copy the csrc folder into your local path and be able to compile without any issues (I tested it with CUDA version 11+): https://github.com/Maelic/SGG-Benchmark/tree/main/sgg_benchmark/csrc

Best

draym28 commented 7 months ago

Hi,

The version of the code in this repo is very outdated and is indeed not up-to-date with current CUDA standards. I fixed all of those issues in my implementation, you can probably copy the csrc folder into your local path and be able to compile without any issues (I tested it with CUDA version 11+): https://github.com/Maelic/SGG-Benchmark/tree/main/sgg_benchmark/csrc

Best

Thanks for your help! But after using your csrc, when I conduct SGDet on Custom Images following the instruction in README.md, other errors still comes up:

D:\App\Anaconda3\envs\sgg\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: 'cp1' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.
  warnings.warn(f'Error checking compiler version for {compiler}: {error}')
D:\App\Anaconda3\envs\sgg\lib\site-packages\apex\__init__.py:68: DeprecatedFeatureWarning: apex.amp is deprecated and will be removed by the end of February 2023. Use [PyTorch AMP](https://pytorch.org/docs/stable/amp.html)
  warnings.warn(msg, DeprecatedFeatureWarning)
Traceback (most recent call last):
  File "tools/relation_test_net.py", line 11, in <module>
    from maskrcnn_benchmark.data import make_data_loader
  File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\__init__.py", line 2, in <module>
    from .build import make_data_loader, get_dataset_statistics
  File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\build.py", line 14, in <module>
    from . import datasets as D
  File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\datasets\__init__.py", line 2, in <module>
    from .coco import COCODataset
  File "d:\code\new_proj\v2t\sgg\scenegraphbenchmark\maskrcnn_benchmark\data\datasets\coco.py", line 39, in <module>
    class COCODataset(torchvision.datasets.coco.CocoDetection):
AttributeError: module 'torchvision' has no attribute 'datasets'

I still stuck on this step. It makes me crazy.

Maelic commented 7 months ago

Which version of torchvision are you using?

Maelic commented 7 months ago

It works for me with torchvision 0.17 for cuda 12.1

draym28 commented 7 months ago

I am using pytorch=1.13 and torchvision=0.14. I can import torchvision.datasets as you did, but when I run the scripts to conduct sgdet on custom images, the error came up. it is confused.

Maelic commented 7 months ago

Then you may be running your code in another conda env or something like that. You can also try to clean and re-build the package with something like rm -rf ./build/ && python setup.py build develop

draym28 commented 7 months ago

I clean and create a new env many times. But the error still come up. And I also did python setup.py build develop every time. Many people also have this problem, see this.

Maelic commented 7 months ago

Can you post the outputs of pip freeze | grep torchvision and conda list | grep torchvision ? You may have different versions of torchvision installed at the same time.

draym28 commented 7 months ago

outputs of pip freeze | grep torchvision: torchvision==0.14.1 outputs of conda list | grep torchvision: torchvision 0.14.1 py38_cu117 pytorch

Maelic commented 7 months ago

Hum I don't know, from your outputs I assume that you installed torchvision with conda, try removing it and install with pip maybe. On my machine, I installed it with the following command (for cuda 12.1): pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121

draym28 commented 7 months ago

Still don't work. This time I create a new env and use pip install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 --index-url https://download.pytorch.org/whl/cu117. But the error still come up.

Maelic commented 7 months ago

I'm afraid I can't help you more here, sorry. I don't recall having this error ever, even when I was working with previous versions of pytorch for this codebase.

draym28 commented 7 months ago

It is OK, thanks for your help. I will keep finding the solution.

Ali-Hatami commented 7 months ago

Hi @Maelic, thank you for sharing your implementation. I'm encountering an issue with installing Apex due to CUDA compatibility. I was wondering if you could provide guidance on how to resolve this. Thanks!

Maelic commented 7 months ago

Hi @Maelic, thank you for sharing your implementation. I'm encountering an issue with installing Apex due to CUDA compatibility. I was wondering if you could provide guidance on how to resolve this. Thanks!

You don't need to use APEX anymore as it is depreciated and built-in for new versions of torch. Please consider removing all reference to apex and this line https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch/blob/4b6b71a90d4198d9dae574d42b062a5e534da291/tools/relation_train_net.py#L159

And add this a little above:

with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=use_amp):
            loss_dict = model(images, targets)

            losses = sum(loss for loss in loss_dict.values())

And it should work, see:

https://github.com/Maelic/SGG-Benchmark/blob/cecf1bbe46f3d862704d9cf0ffccf2282fb00cfe/tools/relation_train_net.py#L51

Ali-Hatami commented 7 months ago

Thank you for the prompt response. In the step-by-step installation (https://github.com/Maelic/SGG-Benchmark/blob/main/INSTALL.md) I have an error. My CUDA version is 11.5 but 11.5 is not available in the nvidia channels. How can I solve this issue?

RuntimeError: The detected CUDA version (11.5) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions.

Maelic commented 7 months ago

Thank you for the prompt response. In the step-by-step installation (https://github.com/Maelic/SGG-Benchmark/blob/main/INSTALL.md) I have an error. My CUDA version is 11.5 but 11.5 is not available in the nvidia channels. How can I solve this issue?

RuntimeError: The detected CUDA version (11.5) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions.

Try upgrading your CUDA version or build torch from source. By the way, this is not an issue directly related to this work, you will probably have more success if you ask on the dedicated PyTorch forum.

jzzzzh commented 4 days ago

Some common problems & solutions when installing maskrcnn_benchmark.

1. THC.h: No such file or directory/THCeilDiv Undefined/ see this

2. identifier "THCudaCheck" is undefined see this

3. torch.utils.cpp_extension.load stuck see this

mark

KaihuaTang / Scene-Graph-Benchmark.pytorch

Error when installing maskrcnn_benchmark #209

Some common problems & solutions when installing maskrcnn_benchmark.

Some common problems & solutions when installing maskrcnn_benchmark.