I am unable to reproduce the results in DETR model: No improvement in train_class_error or any other metric when training DETR on COCO17 without any changes to the original code #527
5. please simplify the steps as much as possible so they do not require additional resources to
run, such as a private dataset.
- No private dataset was used
## Expected behavior:
If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.
- No improvement in the "train_class_error" or any other metric when using DETR out of the box
If you expect the model to converge / work better, note that we do not give suggestions
on how to train a new model.
Only in one of the two conditions we will help with it:
(1) You're unable to reproduce the results in DETR model zoo.
(2) It indicates a DETR bug.
## Environment:
Provide your environment information using the following command:
python -m torch.utils.collect_env
Python version: 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19044-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.12.0
[pip3] torchaudio==0.12.0
[pip3] torchvision==0.13.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h59b6b97_2
[conda] mkl 2021.4.0 haa95532_640
[conda] mkl-service 2.4.0 py38h2bbff1b_0
[conda] mkl_fft 1.3.1 py38h277e83a_0
[conda] mkl_random 1.2.2 py38hf11a4ad_0
[conda] numpy 1.22.3 py38h7a0a035_0
[conda] numpy-base 1.22.3 py38hca35cd5_0
[conda] pytorch 1.12.0 py3.8_cuda11.3_cudnn8_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.12.0 py38_cu113 pytorch
[conda] torchvision 0.13.0 py38_cu113 pytorch
We trained DETR with a batch size of 64. If you are training on only one gpu, I doubt that you are able to have such a big batch size, which explains the difference in performance.
If you do not know the root cause of the problem, and wish someone to help you, please post according to this template:
Instructions To Reproduce the Issue:
what changes you made (
git diff
) or what code you wrote None (train on one GPU)what exact command you run: just run from Pycharm
what you observed (including full logs):
python -m torch.utils.collect_env