facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.25k stars 7.44k forks source link

Manual seed does not work as expected #4438

Open collinmccarthy opened 2 years ago

collinmccarthy commented 2 years ago

Hello,

This is related to issues #2121 and #2615, but neither of these addressed my concerns.

When using an explicit seed with DDP, e.g. cfg.SEED=0, I expected the following:

  1. That seed will be directly used for each GPU to initialize model parameters in the same way
  2. That seed will be directly used for each sampler (e.g. TrainingSampler) to shuffle indices in the same way
  3. That seed will be offset by the rank and worker id such that all DataLoader workers across all GPUs perform unique data augmentations.

Instead, what I see is:

  1. The seed is offset by rank in default_setup, which causes each GPU to initialize parameters differently
  2. The seed is not passed into samplers in _train_loader_from_config which causes the samplers, e.g. TrainingSampler, to use a different randomly-generated seed (which is the same for all GPUs at least),
  3. The seed is offset by worker id only (which assumes it's already been offset by rank) in worker_init_reset_seed.

I think issue (1) above is particularly concerning, and (2) does impact deterministic behavior (which wasn't discussed in issue #2121 ). Issue (3) is fine as long as the seed already takes into account the rank.

My workaround is as follows:

Am I missing something about the underlying issues / current workflow w.r.t. setting the seed? Is there an easier / better workaround than what I'm proposing? Is this going to be "fixed" somehow in the future or is this something that's documented / expected behavior that users should be aware of?

Thank you, -Collin

Environment:

----------------------  ---------------------------------------------------------------------------------------------
sys.platform            linux
Python                  3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:06:46) [GCC 10.3.0]
numpy                   1.23.1
detectron2              0.6 @/home/cmccarth/.conda/envs/sparse_act2/lib/python3.10/site-packages/detectron2
Compiler                GCC 9.4
CUDA compiler           CUDA 11.5
detectron2 arch flags   5.2
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.11.0+cu115 @/home/cmccarth/.conda/envs/sparse_act2/lib/python3.10/site-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0,1                 NVIDIA GeForce GTX TITAN X (arch=5.2)
Driver version          510.73.08
CUDA_HOME               /usr/local/cuda-11.5
Pillow                  9.2.0
torchvision             0.12.0+cu115 @/home/cmccarth/.conda/envs/sparse_act2/lib/python3.10/site-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore                  0.1.5.post20220512
iopath                  0.1.9
cv2                     4.6.0
----------------------  ---------------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.5
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.3
    - Built with CuDNN 8.3.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.5, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

Testing NCCL connectivity ... this should not hang.
NCCL succeeded.

Instructions To Reproduce the Issue:

  1. Full runnable code or full changes you made:

Any project that uses default_setup and an explicit seed will reproduce these issues. Example: DeepLab

  1. What exact command you run:
cd /path/to/detectron2/projects/DeepLab
python train_net.py --config-file configs/Cityscapes-SemanticSegmentation/deeplab_v3_plus_R_103_os16_mg124_poly_90k_bs16.yaml --num-gpus 8 --opts SEED 0
  1. Full logs or other relevant observations:

Observed by setting a breakpoint and inspecting weights on rank 0 and rank 1.

github-actions[bot] commented 2 years ago

You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template. The following information is missing: "Instructions To Reproduce the Issue and Full Logs";

collinmccarthy commented 2 years ago

Apologies, I was inspecting model weights before create_ddp_model(), which correctly syncs the parameters and buffers with rank 0. That means default_setup() works and I do not need a separate call to seed_all_rng(). I do, however, still need to pass in the seed to the TrainSampler for deterministic runs to work (in addition to setting cuDNN deterministic flag to True).

If _train_loader_from_config() and build_detection_train_loader() simply took an optional seed argument and passed it to the Sampler's, I could just pass it in with my manual seed and that would solve the remaining issue.