facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.1k stars 7.42k forks source link

Normal at the beginning but raise the KeyError during training at iter 29979 #1153

Closed holyYodu closed 4 years ago

holyYodu commented 4 years ago

If you do not know the root cause of the problem / bug, and wish someone to help you, please post according to this template:

Instructions To Reproduce the Issue:

  1. what changes you made (git diff) or what code you wrote
    
    # regist custom dataset to detectron
    train_json = '/train/coco2017/annotations/instances_train2017_obj.json'
    val_json = '/train/coco2017/annotations/instances_val2017_obj.json'
    train_img = '/train/coco2017/train2017'
    val_img = '/train/coco2017/val2017'

register_coco_instances("coco_obj_train", {}, train_json, train_img) register_coco_instances("coco_obj_val", {}, val_json, val_img)

def set_config(): cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("coco_obj_train",) cfg.DATASETS.TEST = ("coco_obj_val")
cfg.DATALOADER.NUM_WORKERS = 8

Let training initialize from model zoo

cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.SOLVER.IMS_PER_BATCH = 8
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR
cfg.SOLVER.MAX_ITER = 30000   # 300 iterations seems good enough for this toy dataset; you may need to train longer for a practical dataset
# cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128   # faster, and good enough for this toy dataset (default: 512)
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
return cfg

start train

def train(cfg): trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=True) trainer.train() evaluator = COCOEvaluator("coco_obj_val", cfg, False, output_dir="./output/") val_loader = build_detection_test_loader(cfg, "coco_obj_val") inference_on_dataset(trainer.model, val_loader, evaluator)

2. what exact command you run:
I just run the above code to train the modified coco2017 dataset, and if I set `cfg.DATASETS.TEST = ()  ` just like the colab tutorial, then the code will run normally. And if I set `cfg.DATASETS.TEST = ("coco_obj_val")`, here rises error  when the code finishes training phase and prepare to evaluate.
3. what you observed (including __full logs__):

[04/05 06:15:40 d2.utils.events]: eta: 0:00:20 iter: 29979 total_loss: 0.667 loss_cls: 0.129 loss_box_reg: 0.203 loss_mask: 0.264 loss_rpn_cls: 0.016 loss_rpn_loc: 0.052 time: 0.9709 data_time: 0.0188 lr: 0.000250 max_mem: 11079M ERROR [04/05 06:16:01 d2.engine.train_loop]: Exception during training: Traceback (most recent call last): File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/data/catalog.py", line 55, in get f = DatasetCatalog._REGISTERED[name] KeyError: 'c'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 133, in train self.after_step() File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 153, in after_step h.after_step() File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 347, in after_step self._do_eval() File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/engine/hooks.py", line 321, in _do_eval results = self._func() File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 331, in test_and_save_results self._last_eval_results = self.test(self.cfg, self.model) File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 479, in test data_loader = cls.build_test_loader(cfg, dataset_name) File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 441, in build_test_loader return build_detection_test_loader(cfg, dataset_name) File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/data/build.py", line 359, in build_detection_test_loader dataset_dicts = get_detection_dataset_dicts( File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/data/build.py", line 223, in get_detection_dataset_dicts dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names] File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/data/build.py", line 223, in dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names] File "/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2/data/catalog.py", line 57, in get raise KeyError( KeyError: "Dataset 'c' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_minival_100, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_test, coco_2017_test-dev, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2014_minival_100, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val, coco_obj_train, coco_obj_val"

4. please also simplify the steps as much as possible so they do not require additional resources to
     run, such as a private dataset.

## Expected behavior:

If there are no obvious error in "what you observed" provided above,
please tell us the expected behavior.

If you expect the model to converge / work better, note that we do not give suggestions
on how to train a new model.
Only in one of the two conditions we will help with it:
(1) You're unable to reproduce the results in detectron2 model zoo.
(2) It indicates a detectron2 bug.

## Environment:

Provide your environment information using the following command:

sys.platform linux Python 3.8.1 packaged by conda-forge (default, Jan 29 2020, 14:55:04) [GCC 7.3.0] numpy 1.18.1 detectron2 0.1 @/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/detectron2 detectron2 compiler GCC 7.3 detectron2 CUDA compiler 10.1 detectron2 arch flags sm_35, sm_37, sm_50, sm_52, sm_60, sm_61, sm_70, sm_75 DETECTRON2_ENV_MODULE PyTorch 1.4.0 @/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/torch PyTorch debug build False CUDA available True GPU 0,1 TITAN RTX CUDA_HOME /usr/local/cuda NVCC Cuda compilation tools, release 10.0, V10.0.326 Pillow 7.0.0 torchvision 0.5.0 @/home/mhy/anaconda3/envs/detectron/lib/python3.8/site-packages/torchvision torchvision arch flags sm_35, sm_50, sm_60, sm_70, sm_75 cv2 4.2.0

PyTorch built with:

If your issue looks like an installation issue / environment issue, please first try to solve it yourself with the instructions in https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md#common-installation-issues

ppwwyyxx commented 4 years ago
cfg.DATASETS.TEST = ("coco_obj_val")  

The value should be a list or a tuple. See reference in docs: https://detectron2.readthedocs.io/modules/config.html