Frequently Asked Questions

rentainhe commented 2 years ago

We keep this issue open to collect frequently asked questions and their solutions from the users.

Feel free to leave your comment here if you find any frequent issues and have ways to help others to solve them.

Notes

If you meed some convergence problem with less gpus, it's better to set a larger batch-size (batch-size=8/16) by setting dataloader.train.total_batch_size for training as mentioned in this issue: https://github.com/IDEA-Research/detrex/issues/219

FAQs

1. ImportError: Cannot import 'detrex._C', therefore 'MultiScaleDeformableAttention' is not available.

detrex need **CUDA runtime** to build the `MultiScaleDeformableAttention` operator. In most cases, users do not need to specify this environment variable if you have installed cuda correctly. The default path of CUDA runtime is `usr/local/cuda`. If you find your `CUDA_HOME` is `None`. You may solve it as follows: - If you've already installed **CUDA runtime** in your environments, specify the environment variable (here we take cuda-11.3 as an example): ```bash export CUDA_HOME=/path/to/cuda-11.3/ ``` - If you do not find the CUDA runtime in your environments, consider install it following the [CUDA Toolkit Installation](https://developer.nvidia.com/cuda-toolkit) to install CUDA. Then specify the environment variable `CUDA_HOME`. - After setting `CUDA_HOME`, rebuild detrex again by running `pip install -e .` You can also refer to these issues for more details: https://github.com/IDEA-Research/detrex/issues/98, https://github.com/IDEA-Research/detrex/issues/85

2. How to not filter empty annotations during training.

There're three ways for you to not filter empty annotations during training. 1. modify configs in [configs/common/data/coco_detr.py](https://github.com/IDEA-Research/detrex/blob/5d866bd115b6e0e6a0eac253761855196615e5c4/configs/common/data/coco_detr.py#L17) as follows: ```python dataloader.train = L(build_detection_train_loader)( dataset=L(get_detection_dataset_dicts)(names="coco_2017_train", filter_empty=False), ..., ) ``` 2. modify configs in projects as [dino_r50_4scale_24ep.py](https://github.com/IDEA-Research/detrex/blob/5d866bd115b6e0e6a0eac253761855196615e5c4/projects/dino/configs/dino_r50_4scale_24ep.py#L48). ```python # your config.py dataloader = get_config("common/data/coco_detr.py").dataloader # modify dataloader config # not filter empty annotations during training dataloader.train.dataset.filter_empty = False ``` 3. modify your training scripts to override the config. ```python cd detrex python tools/train_net.py --config-file projects/dino/configs/path/to/config.py --num-gpus 8 dataloader.train.dataset.filter_empy=False ``` You can also refer to these issues for more details: https://github.com/IDEA-Research/detrex/issues/78#issuecomment-1284054108

3. RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:54980 (errno: 98 - Address already in use).

This means that the process you started earlier did not exit correctly, there's two solution: 1. kill the process you started before totally 2. change the running port by setting `--dist-url` ```bash python tools/train_net.py \ --config-file path/to/config.py \ --num-gpus 8 \ --dist-url tcp://127.0.0.1:12345 \ ```

4. DINO CPU inference

Please refer to this PR #157 for more details

5. Training coco-like custom dataset

Please refer to this PR #186 for more details.

ichitaka commented 1 year ago

This should be added to the FAQ in the installation docs.

rentainhe commented 1 year ago

This should be added to the FAQ in the installation docs.

Thanks for your advice~ we will update the document later~

hg6185 commented 1 year ago

Hello, I'm trying to install detrex on an hpc with Nvidia V100. I managed to set the path CUDA_HOME to path/CUDA/11.8.0

When I run the pip install -e . again, Im getting the following warning & error:

warning: nvcc warning : incompatible redefinition for option 'std', the last value of this option was used (I think this relates to one argument -std=c++17)

error: /.../miniconda3/envs/fps-bm/lib/python3.10/site-packages/torch/include/c10/util/Half.h(73): error: identifier "_castu32_f32" is undefined

/.../miniconda3/envs/fps-bm/lib/python3.10/site-packages/torch/include/c10/util/Half.h(89): error: identifier "_castf32_u32" is undefined

2 errors detected in the compilation of "/.../detrex/detrex/layers/csrc/DCNv3/dcnv3_cuda.cu".
error: command '.../software/CUDA/11.8.0/bin/nvcc' failed with exit code 2

Did you ever encounter this and do you know a fix? My gcc is 11.3 and supports c++17 Thanks in advance

rentainhe commented 1 year ago

Hello @hg6185

Seems like dcn_v3 operator not suitable for this environment, you can try this two ways:

search relative issue in InternImage repo here to see if there're same issues
remove this operator if you do not need to benchmark your model on InterImage backbone and re-compile detrex again

this is InternImage's official repo: https://github.com/OpenGVLab/InternImage

Seems like they already have python package for this operator: https://github.com/OpenGVLab/InternImage/releases/tag/whl_files

We will update detrex recently to remove such compiling process for this operator

hg6185 commented 1 year ago

Thanks for the quick reply @rentainhe! Unfortunately, that's not the thing. I removed and reinstalled everything including detectron2 which now cannot be installed due to the same issue. It seems to be a problem with c++ imports in PyTorch.

rentainhe commented 1 year ago

Thanks for the quick reply @rentainhe! Unfortunately, that's not the thing. I removed and reinstalled everything including detectron2 which now cannot be installed due to the same issue. It seems to be a problem with c++ imports in PyTorch.

I'm sorry to hear that. I suggest you could try lowering the PyTorch version to see if it helps to bypass this issue. @hg6185

hg6185 commented 1 year ago

Hi again @rentainhe , I found the problem. The Gcc version was incompatible with CUDA. Note that you should have a GCC that is < 10. In my case, everything works fine with CUDA 11.3.1 and GCC 9.4.0. Thanks again for the quick support!

rentainhe commented 1 year ago

Hi again @rentainhe , I found the problem. The Gcc version was incompatible with CUDA. Note that you should have a GCC that is < 10. In my case, everything works fine with CUDA 11.3.1 and GCC 9.4.0. Thanks again for the quick support!

Would you like to add this situation in our FAQs here: https://github.com/IDEA-Research/detrex/issues/109#issue-1414444469

hg6185 commented 1 year ago

Hi @rentainhe ,

I can add this, but what do you mean? :D Do you want me to write a comment that makes a little summary, so you can delete the rest?

rentainhe commented 1 year ago

Hi @rentainhe ,

I can add this, but what do you mean? :D Do you want me to write a comment that makes a little summary, so you can delete the rest?

Yes, I was wondering if it's better to add it to somewhere or just keep our conversation here to help others who have met the same problem

hg6185 commented 1 year ago

hi @rentainhe a summary of what fixed issue 1 for me: The 'latest' Detectron2 release requires a gcc version that is lower than 10.0.0. I am working on a HPC and I am able to load different CUDAs and GCCs which is practical in this case.

In order to build Detectron2 and Detrex, I used a miniconda env with CUDA 11.3.1 and gcc 9.4.0. I use PyTorch 3.8 which can be installed by this command (I post it here, because you will have to search for it since it's older): conda install pytorch torchvision torchaudio pytorch-cuda=11.3 -c pytorch -c nvidia

Don't forget the Nvidia Toolkit matching with your version. Note that there are some libs like matplotlib that needed to be deprecated to match an older gcc and Python version. In general, you probably will encounter some issues on the way, but I managed to find a solution to all of them.

For instance, If you get an error with pycocotools, do pip uninstall and conda install (from conda forge)

rentainhe commented 1 year ago

hi @rentainhe a summary of what fixed issue 1 for me: The 'latest' Detectron2 release requires a gcc version that is lower than 10.0.0. I am working on a HPC and I am able to load different CUDAs and GCCs which is practical in this case.

In order to build Detectron2 and Detrex, I used a miniconda env with CUDA 11.3.1 and gcc 9.4.0. I use PyTorch 3.8 which can be installed by this command (I post it here, because you will have to search for it since it's older): conda install pytorch torchvision torchaudio pytorch-cuda=11.3 -c pytorch -c nvidia

Don't forget the Nvidia Toolkit matching with your version. Note that there are some libs like matplotlib that needed to be deprecated to match an older gcc and Python version. In general, you probably will encounter some issues on the way, but I managed to find a solution to all of them.

For instance, If you get an error with pycocotools, do pip uninstall and conda install (from conda forge)

Thank you so much for summarizing this! It's really useful!

IDEA-Research / detrex

Frequently Asked Questions #109

Notes

FAQs