Closed grafiszti closed 4 years ago
... as first lines of main function in detectron2/tools/train_net.py
From the description it is not clear where you put these code. More information would be better.
Putting them inside def main(...):
should work. If that's indeed what you did, please provide details steps for others to reproduce the issue.
If you do not know the root cause of the problem / bug, and wish someone to help you, please post according to this template:
Instructions To Reproduce the Issue:
what changes you made (
git diff
) or what code you wroteas first lines of main function in detectron2/tools/train_net.py
what exact command you run: python train_net.py --config-file=faster_rcnn_R_50_FPN_3x.yaml --num-gpus 2
what you observed (including the full logs): Using a generated random seed 57609562 Traceback (most recent call last): File "train_net.py", line 182, in
args=(args,),
File "/media_hdd/my_username_data/project1/detectron2/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/media_hdd/my_username_data/project1/venv/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/media_hdd/my_username_data/project1/venv/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error: Traceback (most recent call last): File "/media_hdd/my_username_data/project1/detectron2/detectron2/data/catalog.py", line 55, in get f = DatasetCatalog._REGISTERED[name] KeyError: 'custom_dataset_name'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/media_hdd/my_username_data/project1/venv/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, args) File "/media_hdd/my_username_data/project1/detectron2/detectron2/engine/launch.py", line 84, in _distributed_worker main_func(args) File "/media_hdd/my_username_data/project1/train_net.py", line 145, in main trainer = Trainer(cfg) File "/media_hdd/my_username_data/project1/detectron2/detectron2/engine/defaults.py", line 246, in init data_loader = self.build_train_loader(cfg) File "/media_hdd/my_username_data/project1/detectron2/detectron2/engine/defaults.py", line 420, in build_train_loader return build_detection_train_loader(cfg) File "/media_hdd/my_username_data/project1/detectron2/detectron2/data/build.py", line 294, in build_detection_train_loader proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None, File "/media_hdd/my_username_data/project1/detectron2/detectron2/data/build.py", line 223, in get_detection_dataset_dicts dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names] File "/media_hdd/my_username_data/project1/detectron2/detectron2/data/build.py", line 223, in
dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
File "/media_hdd/my_username_data/project1/detectron2/detectron2/data/catalog.py", line 59, in get
name, ", ".join(DatasetCatalog._REGISTERED.keys())
KeyError: "Dataset 'custom_dataset_name' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_minival_100, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_test, coco_2017_test-dev, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2014_minival_100, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val"
(venv) ➜ project1 git:(master) ✗ python train_net.py --config-file=faster_rcnn_R_50_FPN_3x.yaml --num-gpus 2
please also simplify the steps as much as possible so they do not require additional resources to run, such as a private dataset.
I registered my custom dataset
Tried to use train_net.py script to train network
Run train_net.py with parameter corresponding to usage script on multiple gpus
Got an error explained above
Running on single gpu works well.
Environment:
Please paste the output of
python -m detectron2.utils.collect_env
.sys.platform linux Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] Numpy 1.18.0 Detectron2 Compiler GCC 7.4 Detectron2 CUDA Compiler 10.2 DETECTRON2_ENV_MODULE
PyTorch 1.3.1
PyTorch Debug Build False
torchvision 0.4.2
CUDA available True
GPU 0,1 Tesla K80
CUDA_HOME /usr/local/cuda
NVCC Cuda compilation tools, release 10.2, V10.2.89
Pillow 6.2.2
cv2 4.1.2
PyTorch built with: