facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.1k stars 7.42k forks source link

Dataset 'custom_dataset' not registered while using multiple gpus training. #625

Closed grafiszti closed 4 years ago

grafiszti commented 4 years ago

If you do not know the root cause of the problem / bug, and wish someone to help you, please post according to this template:

Instructions To Reproduce the Issue:

  1. what changes you made (git diff) or what code you wrote

    register_coco_instances(
        "custom_dataset_name", {},
        "path/to/dataset/annotations/train.json",
        "path/to/dataset/train"
    )
    
    register_coco_instances(
        "custom_dataset_name_val", {},
        "path/to/dataset/annotations/val.json",
        "path/to/dataset/val"
    )

    as first lines of main function in detectron2/tools/train_net.py

  2. what exact command you run: python train_net.py --config-file=faster_rcnn_R_50_FPN_3x.yaml --num-gpus 2

  3. what you observed (including the full logs): Using a generated random seed 57609562 Traceback (most recent call last): File "train_net.py", line 182, in args=(args,), File "/media_hdd/my_username_data/project1/detectron2/detectron2/engine/launch.py", line 49, in launch daemon=False, File "/media_hdd/my_username_data/project1/venv/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn while not spawn_context.join(): File "/media_hdd/my_username_data/project1/venv/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/media_hdd/my_username_data/project1/detectron2/detectron2/data/catalog.py", line 55, in get f = DatasetCatalog._REGISTERED[name] KeyError: 'custom_dataset_name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/media_hdd/my_username_data/project1/venv/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, args) File "/media_hdd/my_username_data/project1/detectron2/detectron2/engine/launch.py", line 84, in _distributed_worker main_func(args) File "/media_hdd/my_username_data/project1/train_net.py", line 145, in main trainer = Trainer(cfg) File "/media_hdd/my_username_data/project1/detectron2/detectron2/engine/defaults.py", line 246, in init data_loader = self.build_train_loader(cfg) File "/media_hdd/my_username_data/project1/detectron2/detectron2/engine/defaults.py", line 420, in build_train_loader return build_detection_train_loader(cfg) File "/media_hdd/my_username_data/project1/detectron2/detectron2/data/build.py", line 294, in build_detection_train_loader proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None, File "/media_hdd/my_username_data/project1/detectron2/detectron2/data/build.py", line 223, in get_detection_dataset_dicts dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names] File "/media_hdd/my_username_data/project1/detectron2/detectron2/data/build.py", line 223, in dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names] File "/media_hdd/my_username_data/project1/detectron2/detectron2/data/catalog.py", line 59, in get name, ", ".join(DatasetCatalog._REGISTERED.keys()) KeyError: "Dataset 'custom_dataset_name' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_minival_100, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_test, coco_2017_test-dev, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2014_minival_100, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val"

(venv) ➜ project1 git:(master) ✗ python train_net.py --config-file=faster_rcnn_R_50_FPN_3x.yaml --num-gpus 2

  1. please also simplify the steps as much as possible so they do not require additional resources to run, such as a private dataset.

  2. I registered my custom dataset

  3. Tried to use train_net.py script to train network

  4. Run train_net.py with parameter corresponding to usage script on multiple gpus

  5. Got an error explained above

  6. Running on single gpu works well.

Environment:

Please paste the output of python -m detectron2.utils.collect_env.


sys.platform linux Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] Numpy 1.18.0 Detectron2 Compiler GCC 7.4 Detectron2 CUDA Compiler 10.2 DETECTRON2_ENV_MODULE PyTorch 1.3.1 PyTorch Debug Build False torchvision 0.4.2 CUDA available True GPU 0,1 Tesla K80 CUDA_HOME /usr/local/cuda NVCC Cuda compilation tools, release 10.2, V10.2.89 Pillow 6.2.2 cv2 4.1.2


PyTorch built with:

ppwwyyxx commented 4 years ago

... as first lines of main function in detectron2/tools/train_net.py

From the description it is not clear where you put these code. More information would be better.

Putting them inside def main(...): should work. If that's indeed what you did, please provide details steps for others to reproduce the issue.