facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
Apache License 2.0
29.9k stars 7.4k forks source link

T.ResizeShortestEdge breaks semantic segmentation labels #682

Closed jaywonchung closed 4 years ago

jaywonchung commented 4 years ago


The following snippet is from the last part of detectron2.data.DatasetMapper.

# USER: Remove if you don't do semantic/panoptic segmentation.
if "sem_seg_file_name" in dataset_dict:
    with PathManager.open(dataset_dict.pop("sem_seg_file_name"), "rb") as f:
        sem_seg_gt = Image.open(f)
        sem_seg_gt = np.asarray(sem_seg_gt, dtype="uint8")
    sem_seg_gt = transforms.apply_segmentation(sem_seg_gt)
    sem_seg_gt = torch.as_tensor(sem_seg_gt.astype("long"))
    dataset_dict["sem_seg"] = sem_seg_gt
return dataset_dict

It can be seen that transforms is directly applied to the semantic segmentation ground truth array at

sem_seg_gt = transforms.apply_segmentation(sem_seg_gt)

However, I encounterd a problem due to the T.ResizeShortestEdge transform included in transforms. By default T.ResizeShortestEdge resizes images with bilinear interpolation, and this broke my integer labels in the semantic segmentaion ground truth arrays.


------------------------  -------------------------------------------------------------------
sys.platform              linux
Python                    3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
Numpy                     1.17.4
Detectron2 Compiler       GCC 7.4
Detectron2 CUDA Compiler  10.1
PyTorch                   1.3.0a0+50c90a2
PyTorch Debug Build       False
torchvision               0.4.2
CUDA available            True
GPU 0                     GeForce RTX 2080 Ti
CUDA_HOME                 /usr/local/cuda
NVCC                      Cuda compilation tools, release 10.1, V10.1.168
Pillow                    6.2.2
cv2                       4.1.0
------------------------  -------------------------------------------------------------------
PyTorch built with:
  - GCC 7.4
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75
  - CuDNN 7.6.4
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
jaywonchung commented 4 years ago

I temporarily resolved this issue by writing my own mapper. The following snipper will be enough:

# Apply self.tfm_gens to image. Build transform list for sem_seg_gt.
transforms_gt = []
for g in self.tfm_gens:
    tfm = g.get_transform(image)
    if isinstance(g, T.ResizeShortestEdge):
        if self.is_train:
            min_size = self.cfg.INPUT.MIN_SIZE_TRAIN
            max_size = self.cfg.INPUT.MAX_SIZE_TRAIN
            sample_style = self.cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING
            min_size = self.cfg.INPUT.MIN_SIZE_TEST
            max_size = self.cfg.INPUT.MAX_SIZE_TEST
            sample_style = "choice"
        tfm_gt = T.ResizeShortestEdge(
    image = tfm.apply_image(image)

Although this issue can be circumvented, improving the default behavior (of DatasetMapper) for semantic segmentation will be great.

ppwwyyxx commented 4 years ago

. By default T.ResizeShortestEdge resizes images with bilinear interpolation,

I don't think that will happen to segmentation: https://github.com/facebookresearch/detectron2/blob/3e994e353b0513c8994c3964b4585191f04ea14f/detectron2/data/transforms/transform.py#L92-L94

Let us know if that contradicts with what you observed.