facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.34k stars 7.46k forks source link

Semantic Segmentation - Multiclass Classification - Implementation for per Class Accuracy mixed up with per Class Recall? #4862

Open biggeR-data opened 1 year ago

biggeR-data commented 1 year ago

Discussed in https://github.com/facebookresearch/detectron2/discussions/4861

Originally posted by **biggeR-data** March 16, 2023 Hey everyone, I am using Detectron2 with a custom dataset for semantic segmentation. My dataset contains multiple classes so it is not a binary classification problem. I checked the implementation details for the evaluation metrics in the script [detectron2/evaluation/sem_seg_evaluation.py](https://github.com/facebookresearch/detectron2/blob/main/detectron2/evaluation/sem_seg_evaluation.py). There I stumbled across the per class accuracy implementation which seems to be confused with the per class recall. Here're the [code parts in question](https://github.com/facebookresearch/detectron2/blob/main/detectron2/evaluation/sem_seg_evaluation.py#L186-L193): ```python acc = np.full(self._num_classes, np.nan, dtype=np.float) # ... tp = self._conf_matrix.diagonal()[:-1].astype(np.float) pos_gt = np.sum(self._conf_matrix[:-1, :-1], axis=0).astype(np.float) # ... acc_valid = pos_gt > 0 acc[acc_valid] = tp[acc_valid] / pos_gt[acc_valid] ``` Judging by the naming of the variables `pos_gt` represents the Ground Truths / Actuals. This means the Actuals are in the columns of the confusion matrix and the Predictions are in the rows of the confusion matrix. > Note: This notation differs in orientation from the [Wikipedia Definition of a Confusion Matrix](https://en.wikipedia.org/wiki/Confusion_matrix). If you want to calculate Metrics according to the Wikipedia layout you would need to transpose the Confusion Matrix given in the evaluation script. I will stick to the orientation provided by detectron2 with my following examples to avoid confusion. Looking at the code the Accuracy per class is calculated by dividing TP by the Actual Positives (named `P` in Wikipedia's entry). This does not correspond to the definition of the Accuracy measure. The Accuracy is defined as: ``` (TP + TN) / n ``` also known by this Formula: ``` (TP + TN) / (TP + FP + TN + FN) ``` The Recall is defined as: ``` TP / (TP + FN) = TP / P ``` which is exactly the formula used to calculate the per class 'accuracy' in [Line 193](https://github.com/facebookresearch/detectron2/blob/main/detectron2/evaluation/sem_seg_evaluation.py#L193). I have searched for articles covering Multiclass Classification where per class accuracy and per class recall are covered however the sources for this are rather scarce. I did find a [comment on Stackoverflow](https://stackoverflow.com/questions/39770376/scikit-learn-get-accuracy-scores-for-each-class/65673016#comment118400712_50977153) claiming per class accuracy and per class recall are the same for multiclass classification. On the other hand I found an [example](http://rasbt.github.io/mlxtend/user_guide/evaluate/accuracy_score/) where I continued the given example and arrived at the conclusion that per class accuracy is in fact not the same as per class recall. Please refer to this screenshot of the continuation of the example: ![Screenshot 2023-03-16 at 12 19 47](https://user-images.githubusercontent.com/75607150/225601537-04480d19-e8e0-417c-902b-dd33c8dd25f9.png) So I guess my question is: Is the current implementation for per class accuracy in `sem_seg_evaluation.py` mixed up with per class recall given a multiclass classification problem?
github-actions[bot] commented 1 year ago

You've chosen to report an unexpected problem or bug. Unless you already know the root cause of it, please include details about it by filling the issue template. The following information is missing: "Instructions To Reproduce the Issue and Full Logs"; "Your Environment";

biggeR-data commented 1 year ago

Instructions To Reproduce the Issue:

  1. Full runnable code or full changes you made: No changes were made. The used version (v0.5) is identical with the related code section of the current main branch.

  2. What exact command you run: python train_net.py --num-gpus 1 --config-file ./configs/fp-geom/cc5k/R50.yaml OUTPUT_DIR logs/runs/R50

  3. Full logs or other relevant observations: Logs are not relevant in this case as the source code itself might be the cause of a mixup in metrics.

Expected behavior:

Evaluation metrics should calculate actual per class accuracy and not per class recall.

Environment:

----------------------  ---------------------------------------------------------------------------------------
sys.platform            linux
Python                  3.7.0 (default, Oct  9 2018, 10:31:47) [GCC 7.3.0]
numpy                   1.21.5
detectron2              0.5 @/home/stud2/patrick_bigge/experiment/setup/detectron2-0.5/detectron2
Compiler                GCC 9.4
CUDA compiler           CUDA 11.5
detectron2 arch flags   7.0
DETECTRON2_ENV_MODULE   <not set>
PyTorch                 1.7.1 @/home/stud2/miniconda3/envs/p37t17det2_2/lib/python3.7/site-packages/torch
PyTorch debug build     False
GPU available           Yes
GPU 0                   GRID V100S-16C (arch=7.0)
CUDA_HOME               /usr/local/cuda
Pillow                  9.3.0
torchvision             0.8.2 @/home/stud2/miniconda3/envs/p37t17det2_2/lib/python3.7/site-packages/torchvision
torchvision arch flags  3.5, 5.0, 6.0, 7.0, 7.5, 8.0
fvcore                  0.1.5.post20210915
iopath                  0.1.9
cv2                     4.1.2
----------------------  ---------------------------------------------------------------------------------------
PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.0
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.0.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
biggeR-data commented 1 year ago

Any news on this @ppwwyyxx ?