facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
29.9k stars 7.4k forks source link

T.ResizeShortestEdge breaks semantic segmentation labels #682

Closed jaywonchung closed 4 years ago

jaywonchung commented 4 years ago

Description

The following snippet is from the last part of detectron2.data.DatasetMapper.

# USER: Remove if you don't do semantic/panoptic segmentation.
if "sem_seg_file_name" in dataset_dict:
    with PathManager.open(dataset_dict.pop("sem_seg_file_name"), "rb") as f:
        sem_seg_gt = Image.open(f)
        sem_seg_gt = np.asarray(sem_seg_gt, dtype="uint8")
    sem_seg_gt = transforms.apply_segmentation(sem_seg_gt)
    sem_seg_gt = torch.as_tensor(sem_seg_gt.astype("long"))
    dataset_dict["sem_seg"] = sem_seg_gt
return dataset_dict

It can be seen that transforms is directly applied to the semantic segmentation ground truth array at

sem_seg_gt = transforms.apply_segmentation(sem_seg_gt)

However, I encounterd a problem due to the T.ResizeShortestEdge transform included in transforms. By default T.ResizeShortestEdge resizes images with bilinear interpolation, and this broke my integer labels in the semantic segmentaion ground truth arrays.

Environment:

------------------------  -------------------------------------------------------------------
sys.platform              linux
Python                    3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31) [GCC 7.3.0]
Numpy                     1.17.4
Detectron2 Compiler       GCC 7.4
Detectron2 CUDA Compiler  10.1
DETECTRON2_ENV_MODULE     <not set>
PyTorch                   1.3.0a0+50c90a2
PyTorch Debug Build       False
torchvision               0.4.2
CUDA available            True
GPU 0                     GeForce RTX 2080 Ti
CUDA_HOME                 /usr/local/cuda
NVCC                      Cuda compilation tools, release 10.1, V10.1.168
Pillow                    6.2.2
cv2                       4.1.0
------------------------  -------------------------------------------------------------------
PyTorch built with:
  - GCC 7.4
  - Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.20.5 (Git Hash 0125f28c61c1f822fd48570b4c1066f96fcb9b2e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75
  - CuDNN 7.6.4
  - Magma 2.5.1
  - Build settings: BLAS=MKL, BUILD_NAMEDTENSOR=OFF, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow, DISABLE_NUMA=1, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=True, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,
jaywonchung commented 4 years ago

I temporarily resolved this issue by writing my own mapper. The following snipper will be enough:

# Apply self.tfm_gens to image. Build transform list for sem_seg_gt.
transforms_gt = []
for g in self.tfm_gens:
    tfm = g.get_transform(image)
    if isinstance(g, T.ResizeShortestEdge):
        if self.is_train:
            min_size = self.cfg.INPUT.MIN_SIZE_TRAIN
            max_size = self.cfg.INPUT.MAX_SIZE_TRAIN
            sample_style = self.cfg.INPUT.MIN_SIZE_TRAIN_SAMPLING
        else:
            min_size = self.cfg.INPUT.MIN_SIZE_TEST
            max_size = self.cfg.INPUT.MAX_SIZE_TEST
            sample_style = "choice"
        tfm_gt = T.ResizeShortestEdge(
            short_edge_length=min_size,
            max_size=max_size,
            sample_style=sample_style,
            interp=Image.NEAREST
        ).get_transform(image)
        transforms_gt.append(tfm_gt)
    else:
        transforms_gt.append(tfm)
    image = tfm.apply_image(image)

Although this issue can be circumvented, improving the default behavior (of DatasetMapper) for semantic segmentation will be great.

ppwwyyxx commented 4 years ago

. By default T.ResizeShortestEdge resizes images with bilinear interpolation,

I don't think that will happen to segmentation: https://github.com/facebookresearch/detectron2/blob/3e994e353b0513c8994c3964b4585191f04ea14f/detectron2/data/transforms/transform.py#L92-L94

Let us know if that contradicts with what you observed.