facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.2k stars 7.44k forks source link

Segmentation Fault during importing DefaultTrainer #1979

Closed SebJak closed 4 years ago

SebJak commented 4 years ago

Hi everyone,

I try to run detectron2 training on docker container with cpu. From few days I struggle with Segmentation Fault. This problem occurs when I import DefaultTrainer

Another news is when I create virtual env on my local machine and install all dependencies then everything is working well.

I attached necessary files

My scripts:

  1. Dockerfile
    
    FROM centos/python-36-centos7
    USER root

WORKDIR /root

RUN yum update -y RUN yum install git -y RUN yum install dos2unix -y

RUN pip install --upgrade pip RUN pip install scikit-build RUN pip install opencv-python RUN pip install tensorboard RUN pip install cython RUN pip install -U matplotlib

RUN pip install google-cloud-storage RUN pip install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html RUN pip install -U 'git+https://github.com/facebookresearch/fvcore' RUN pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cpu/torch1.6/index.html RUN pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI' RUN pip install pycocotools==2.0.1 --no-cache-dir

Clone detectron2

RUN mkdir ./detectron2_repo RUN git clone https://github.com/facebookresearch/detectron2 ./detectron2_repo

Prepare catalog structure

RUN mkdir ./output RUN mkdir ./storage RUN mkdir ./eval_dir RUN mkdir ./src

COPY src/ ./src/

COPY entrypoint.sh ./entrypoint.sh RUN ["chmod", "+x", "./entrypoint.sh"] RUN dos2unix ./entrypoint.sh ENV PYTHONFAULTHANDLER=1

Sets up the entry point to invoke the trainer.

ENTRYPOINT ["./entrypoint.sh"]

entrypoint.sh

!/bin/sh

set -e

echo "Starting training application with parameters $@" python --version python ./src/main.py $@

cocoTrainer.py

from detectron2.engine import DefaultTrainer from detectron2.evaluation import COCOEvaluator

class Trainer(DefaultTrainer): def init(self, cfg, eval_dir): self.eval_dir = eval_dir super().init(cfg)

@classmethod
def build_evaluator(cls, cfg, dataset_name, output_folder=None):
    return COCOEvaluator(dataset_name, cfg, False, cls.eval_dir)
properties

def get_args(): args_parser = argparse.ArgumentParser()

args_parser.add_argument(
    '--max_iteration',
    help='Max iteration.',
    nargs='?',
    type=int,
    default=300
)

args_parser.add_argument(
    '--base_lr',
    help='Base learning rate',
    nargs='?',
    type=float,
    default=0.02
)

args_parser.add_argument(
    '--validation_dataset_name',
    help='Validation data set name',
    nargs='?',
    default='validation'
)

args_parser.add_argument(
    '--test_dataset_name',
    help='Test data set name',
    nargs='?',
    default='test'
)

args_parser.add_argument(
    '--train_dataset_name',
    help='Train data set file path',
    nargs='?',
    default='train'
)

args_parser.add_argument(
    '--config',
    help='Model base configuration',
    nargs='?',
    default='./detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml'
)

args_parser.add_argument(
    '--model_weights',
    help='Model weights configuration',
    nargs='?',
    default='detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl'
)

args_parser.add_argument(
    '--device',
    help='Device used for training cpu/gpu',
    nargs='?',
    default='cpu'
)

args_parser.add_argument(
    '--resume',
    help='Resume training',
    nargs='?',
    type=bool,
    default=False
)

args_parser.add_argument(
    '--warmup_iters',
    help='The learning rate starts from 0 and goes to the preset one for this number of iterations',
    nargs='?',
    type=int,
    default=100
)

args_parser.add_argument(
    '--steps',
    help='The checkpoints (number of iterations) at which the learning rate will be reduced by GAMMA value. Example usage: --steps 100 200 300',
    action='store',
    type=int,
    nargs='*',
    default=[100, 300]
)

args_parser.add_argument(
    '--gamma',
    help='Learning rate decay rate',
    nargs='?',
    type=float,
    default=0.05
)

args_parser.add_argument(
    '--eval_period',
    help='The period at which we will evaluate on the test set',
    nargs='?',
    type=int,
    default=100
)

2. Logs:

Starting training application with parameters Python 3.6.9 Fatal Python error: Segmentation fault

Thread 0x00007ff34afdf740 (most recent call first): File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pathlib.py", line 662 in _from_parts File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pathlib.py", line 1006 in new File "/opt/app-root/lib/python3.6/site-packages/matplotlib/font_manager.py", line 949 in default File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/encoder.py", line 438 in _iterencode File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/encoder.py", line 325 in _iterencode_list File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/encoder.py", line 404 in _iterencode_dict File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/encoder.py", line 430 in _iterencode File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/encoder.py", line 438 in _iterencode File "/opt/rh/rh-python36/root/usr/lib64/python3.6/json/init.py", line 180 in dump File "/opt/app-root/lib/python3.6/site-packages/matplotlib/font_manager.py", line 999 in json_dump File "/opt/app-root/lib/python3.6/site-packages/matplotlib/font_manager.py", line 1425 in _rebuild File "/opt/app-root/lib/python3.6/site-packages/matplotlib/font_manager.py", line 1431 in File "", line 219 in _call_with_frames_removed File "", line 678 in exec_module File "", line 665 in _load_unlocked File "", line 955 in _find_and_load_unlocked File "", line 971 in _find_and_load File "/opt/app-root/lib/python3.6/site-packages/matplotlib/contour.py", line 16 in File "", line 219 in _call_with_frames_removed File "", line 678 in exec_module File "", line 665 in _load_unlocked File "", line 955 in _find_and_load_unlocked File "", line 971 in _find_and_load File "/opt/app-root/lib/python3.6/site-packages/matplotlib/colorbar.py", line 44 in File "", line 219 in _call_with_frames_removed File "", line 678 in exec_module File "", line 665 in _load_unlocked File "", line 955 in _find_and_load_unlocked File "", line 971 in _find_and_load File "/opt/app-root/lib/python3.6/site-packages/matplotlib/pyplot.py", line 36 in File "", line 219 in _call_with_frames_removed File "", line 678 in exec_module File "", line 665 in _load_unlocked File "", line 955 in _find_and_load_unlocked File "", line 971 in _find_and_load File "/opt/app-root/lib/python3.6/site-packages/pycocotools/coco.py", line 49 in File "", line 219 in _call_with_frames_removed File "", line 678 in exec_module File "", line 665 in _load_unlocked File "", line 955 in _find_and_load_unlocked File "", line 971 in _find_and_load File "/opt/app-root/lib/python3.6/site-packages/detectron2/evaluation/coco_evaluation.py", line 15 in File "", line 219 in _call_with_frames_removed File "", line 678 in exec_module File "", line 665 in _load_unlocked File "", line 955 in _find_and_load_unlocked File "", line 971 in _find_and_load File "/opt/app-root/lib/python3.6/site-packages/detectron2/evaluation/init.py", line 3 in File "", line 219 in _call_with_frames_removed File "", line 678 in exec_module File "", line 665 in _load_unlocked File "", line 955 in _find_and_load_unlocked File "", line 971 in _find_and_load File "", line 219 in _call_with_frames_removed File "", line 941 in _find_and_load_unlocked File "", line 971 in _find_and_load File "/opt/app-root/lib/python3.6/site-packages/detectron2/engine/hooks.py", line 18 in File "", line 219 in _call_with_frames_removed File "", line 678 in exec_module File "", line 665 in _load_unlocked File "", line 955 in _find_and_load_unlocked File "", line 971 in _find_and_load File "/opt/app-root/lib/python3.6/site-packages/detectron2/engine/init.py", line 11 in File "", line 219 in _call_with_frames_removed File "", line 678 in exec_module File "", line 665 in _load_unlocked File "", line 955 in _find_and_load_unlocked File "", line 971 in _find_and_load File "/root/src/detectron.py", line 3 in File "", line 219 in _call_with_frames_removed File "", line 678 in exec_module File "", line 665 in _load_unlocked File "", line 955 in _find_and_load_unlocked File "", line 971 in _find_and_load File "./src/main.py", line 4 in ./entrypoint.sh: line 9: 7 Segmentation fault python ./src/main.py $@


## Environment:

Docker version 19.03.12, build 48a66213fe

sys.platform linux Python 3.6.9 (default, Nov 11 2019, 11:24:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] numpy 1.19.1 detectron2 0.2.1 @/opt/app-root/lib/python3.6/site-packages/detectron2 Compiler GCC 7.3 CUDA compiler not available DETECTRON2_ENV_MODULE PyTorch 1.6.0+cpu @/opt/app-root/lib/python3.6/site-packages/torch PyTorch debug build False GPU available False Pillow 7.2.0 torchvision 0.7.0+cpu @/opt/app-root/lib/python3.6/site-packages/torchvision fvcore 0.1.2


PyTorch built with:



If you have any ideas or advice it will be grateful 
ppwwyyxx commented 4 years ago

The script

import torch
try:
        import cv2
except:
        pass
import matplotlib.pyplot as plt

can reproduce the failure in the container. Therefore it's unrelated to detectron2.

You can try opencv-python-headless or even better, install opencv with yum install. opencv-python is a bad package that has many such issues.

SebJak commented 4 years ago

Thank you for your help, I fixed my problem by installing yum install python3-numpy opencv*