facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.54k stars 7.48k forks source link

Unable to train new baselines while filter_empty = false #4616

Open alrightkami opened 2 years ago

alrightkami commented 2 years ago

Hello there, I've noticed that as soon as I add the following line in my config.py, training crashes: dataloader.train.dataset.filter_empty = False

The line I'm running: tools/lazyconfig_train_net.py --config-file configs/new_baselines/mask_rcnn_R_50_FPN_100ep.py

Logs:

[10/24 13:03:50 d2.engine.train_loop]: Starting training from iteration 0
ERROR [10/24 13:03:50 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/engine/train_loop.py", line 409, in run_step
    data = next(self._data_loader_iter)
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/data/common.py", line 234, in __iter__
    for d in self.dataset:
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 438, in reraise
    raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
    data.append(next(self.dataset_iter))
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/data/common.py", line 201, in __iter__
    yield self.dataset[idx]
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/data/common.py", line 90, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/utils/serialize.py", line 26, in __call__
    return self._obj(*args, **kwargs)
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/data/dataset_mapper.py", line 189, in __call__
    self._transform_annotations(dataset_dict, transforms, image_shape)
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/data/dataset_mapper.py", line 141, in _transform_annotations
    instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/structures/instances.py", line 66, in __getattr__
    raise AttributeError("Cannot find field '{}' in the given Instances!".format(name))
AttributeError: Cannot find field 'gt_masks' in the given Instances!

[10/24 13:03:50 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
[10/24 13:03:50 d2.utils.events]:  iter: 0    lr: N/A  max_mem: 181M
Traceback (most recent call last):
  File "tools/lazyconfig_train_net.py", line 159, in <module>
    launch(
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/engine/launch.py", line 82, in launch
    main_func(*args)
  File "tools/lazyconfig_train_net.py", line 154, in main
    do_train(args, cfg)
  File "tools/lazyconfig_train_net.py", line 139, in do_train
    trainer.train(start_iter, cfg.train.max_iter)
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/engine/train_loop.py", line 149, in train
    self.run_step()
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/engine/train_loop.py", line 409, in run_step
    data = next(self._data_loader_iter)
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/data/common.py", line 234, in __iter__
    for d in self.dataset:
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 438, in reraise
    raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
    data.append(next(self.dataset_iter))
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/data/common.py", line 201, in __iter__
    yield self.dataset[idx]
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/data/common.py", line 90, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/utils/serialize.py", line 26, in __call__
    return self._obj(*args, **kwargs)
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/data/dataset_mapper.py", line 189, in __call__
    self._transform_annotations(dataset_dict, transforms, image_shape)
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/data/dataset_mapper.py", line 141, in _transform_annotations
    instances.gt_boxes = instances.gt_masks.get_bounding_boxes()
  File "/home/jovyan/data/kamila/detrex/detectron2/detectron2/structures/instances.py", line 66, in __getattr__
    raise AttributeError("Cannot find field '{}' in the given Instances!".format(name))
AttributeError: Cannot find field 'gt_masks' in the given Instances!

Environment:

2022-10-31 10:18:15 URL:https://raw.githubusercontent.com/facebookresearch/detectron2/main/detectron2/utils/collect_env.py [8391/8391] -> "collect_env.py" [1]
----------------------  -----------------------------------------------------------------------------
sys.platform            linux
Python                  3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59) [GCC 10.3.0]
numpy                   1.22.4
detectron2              0.6 @/home/jovyan/data/kamila/detrex/detectron2/detectron2
Compiler                GCC 10.3
CUDA compiler           not available
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.12.1+cu102 @/opt/conda/lib/python3.9/site-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0,1                 NVIDIA Quadro RTX 8000 (arch=7.5)
Driver version          465.19.01
CUDA_HOME               None - invalid!
Pillow                  9.1.1
torchvision             0.13.1+cu102 @/opt/conda/lib/python3.9/site-packages/torchvision
torchvision arch flags  /opt/conda/lib/python3.9/site-packages/torchvision/_C.so
fvcore                  0.1.5.post20220512
iopath                  0.1.9
cv2                     4.6.0
----------------------  -----------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

Testing NCCL connectivity ... this should not hang.
NCCL succeeded.
github-actions[bot] commented 2 years ago

You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template. The following information is missing: "Instructions To Reproduce the Issue and Full Logs"; "Your Environment";

dcy0577 commented 1 year ago

Hi did you solve this problem? Do you know how to set up filtering_empty_annotation correctly in lazy config, like in old config cfg.DATALOADER.FILTER_EMPTY_ANNOTATIONS?

alrightkami commented 1 year ago

@dcy0577 Hi, no I still struggle with the issue

alrightkami commented 1 year ago

@dcy0577 FYI I fixed it in my PR if still relevant for you