Closed xyry closed 4 years ago
Detectron2 has builtin support for a few datasets.
The datasets are assumed to exist in a directory specified by the environment variable
DETECTRON2_DATASETS
.
Under this directory, detectron2 will look for datasets in the structure described below, if needed.
$DETECTRON2_DATASETS/
coco/
lvis/
cityscapes/
VOC20{07,12}/
You can set the location for builtin datasets by export DETECTRON2_DATASETS=/path/to/datasets
.
If left unset, the default is ./datasets
relative to your current working directory.
Thank you so much, it works! But I met another question. the error is like this
Using training sampler TrainingSampler
[09/30 11:04:18 fvcore.common.checkpoint]: Loading checkpoint from x65.pkl
Traceback (most recent call last):
File "train_panoptic_deeplab.py", line 198, in <module>
args=(args,),
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/detectron2/engine/launch.py", line 59, in launch
daemon=False,
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/detectron2/engine/launch.py", line 94, in _distributed_worker
main_func(*args)
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/train_panoptic_deeplab.py", line 185, in main
trainer.resume_or_load(resume=args.resume)
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/detectron2/engine/defaults.py", line 314, in resume_or_load
checkpoint = self.checkpointer.resume_or_load(self.cfg.MODEL.WEIGHTS, resume=resume)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/fvcore/common/checkpoint.py", line 192, in resume_or_load
return self.load(path, checkpointables=[])
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/fvcore/common/checkpoint.py", line 118, in load
assert os.path.isfile(path), "Checkpoint {} not found!".format(path)
AssertionError: Checkpoint x65.pkl not found!
I think this should be done in the same way, so which environment variable should be modified? I don't know how to find solutions
I did this step
# download your pretrained model:
wget https://github.com/LikeLy-Journey/SegmenTron/releases/download/v0.1.0/tf-xception65-270e81cf.pth -O x65.pth
# run the conversion
python convert-pretrain-model-to-d2.py x65.pth x65.pkl
But it looks like that detectron2 can not find it
Did you put x65.pkl
under tools_d2
and run code under tools_d2
?
If not, try setting:
MODEL:
WEIGHTS: "/path/to/your/x65.pkl"
OK, it works. it seems I met a big problem when I run
python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --num-gpus 1
[09/30 11:24:50 d2.engine.train_loop]: Starting training from iteration 0
ERROR [09/30 11:25:02 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/detectron2/engine/train_loop.py", line 142, in train
self.run_step()
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/detectron2/engine/train_loop.py", line 235, in run_step
loss_dict = self.model(data)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/projects/Panoptic-DeepLab/panoptic_deeplab/panoptic_seg.py", line 86, in forward
features = self.backbone(images.tensor)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/d2/backbone.py", line 136, in forward
y = super().forward(x)
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/../segmentation/model/backbone/xception.py", line 190, in forward
x = self.bn1(x)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 493, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 620, in get_world_size
return _get_group_size(group)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 219, in _get_group_size
_check_default_pg()
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized
[09/30 11:25:04 d2.engine.hooks]: Total training time: 0:00:13 (0:00:00 on hooks)
[09/30 11:25:04 d2.utils.events]: iter: 0 lr: N/A max_mem: 991M
Traceback (most recent call last):
File "train_panoptic_deeplab.py", line 198, in <module>
args=(args,),
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/detectron2/engine/launch.py", line 62, in launch
main_func(*args)
File "train_panoptic_deeplab.py", line 186, in main
return trainer.train()
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/detectron2/engine/defaults.py", line 402, in train
super().train(self.start_iter, self.max_iter)
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/detectron2/engine/train_loop.py", line 142, in train
self.run_step()
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/detectron2/engine/train_loop.py", line 235, in run_step
loss_dict = self.model(data)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/detectron2/projects/Panoptic-DeepLab/panoptic_deeplab/panoptic_seg.py", line 86, in forward
features = self.backbone(images.tensor)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/d2/backbone.py", line 136, in forward
y = super().forward(x)
File "/mnt/tecmint/data/ypl_file/panoptic-deeplab/panoptic-deeplab-master/tools_d2/../segmentation/model/backbone/xception.py", line 190, in forward
x = self.bn1(x)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 493, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 620, in get_world_size
return _get_group_size(group)
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 219, in _get_group_size
_check_default_pg()
File "/home/ypl/miniconda3/envs/solo/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized
段错误
I am sorry to bother you. I think that maybe I wrote the wrong parameters, --num-gpus x, the x should greater than 1. I solved this problem by running this command and I modify Base-PanopticDeepLab-OS16.yaml like this https://github.com/bowenc0221/panoptic-deeplab/issues/16 .
# code
python train_panoptic_deeplab.py --config-file config/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --num-gpus 1
# modify
IMS_PER_BATCH 32 -> IMS_PER_BATCH 8
now, it began to train. This is a reference for anyone who met this problem, I have 4 cards, this parameter is set to 8, my machine can run in this settings.
Hey @bowenc0221 @xyry I am also facing the similar issue while running in colab
python train_panoptic_deeplab.py --config-file configs/Cityscapes-PanopticSegmentation/panoptic_deeplab_X_65_os16_mg124_poly_90k_bs32_crop_512_1024.yaml --num-gpus 1
ERROR Message
[01/01 06:08:28 d2.engine.train_loop]: Starting training from iteration 0
ERROR [01/01 06:08:32 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/detectron2/engine/train_loop.py", line 134, in train
self.run_step()
File "/usr/local/lib/python3.6/dist-packages/detectron2/engine/defaults.py", line 423, in run_step
self._trainer.run_step()
File "/usr/local/lib/python3.6/dist-packages/detectron2/engine/train_loop.py", line 228, in run_step
loss_dict = self.model(data)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/detectron2/projects/panoptic_deeplab/panoptic_seg.py", line 87, in forward
features = self.backbone(images.tensor)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/panoptic-deeplab/tools_d2/d2/backbone.py", line 136, in forward
y = super().forward(x)
File "/content/panoptic-deeplab/tools_d2/../segmentation/model/backbone/xception.py", line 190, in forward
x = self.bn1(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/batchnorm.py", line 519, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 625, in get_world_size
return _get_group_size(group)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
_check_default_pg()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 211, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized
[01/01 06:08:32 d2.engine.hooks]: Total training time: 0:00:04 (0:00:00 on hooks)
[01/01 06:08:32 d2.utils.events]: iter: 0 lr: N/A max_mem: 463M
Traceback (most recent call last):
File "train_panoptic_deeplab.py", line 183, in <module>
args=(args,),
File "/usr/local/lib/python3.6/dist-packages/detectron2/engine/launch.py", line 62, in launch
main_func(*args)
File "train_panoptic_deeplab.py", line 171, in main
return trainer.train()
File "/usr/local/lib/python3.6/dist-packages/detectron2/engine/defaults.py", line 413, in train
super().train(self.start_iter, self.max_iter)
File "/usr/local/lib/python3.6/dist-packages/detectron2/engine/train_loop.py", line 134, in train
self.run_step()
File "/usr/local/lib/python3.6/dist-packages/detectron2/engine/defaults.py", line 423, in run_step
self._trainer.run_step()
File "/usr/local/lib/python3.6/dist-packages/detectron2/engine/train_loop.py", line 228, in run_step
loss_dict = self.model(data)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/detectron2/projects/panoptic_deeplab/panoptic_seg.py", line 87, in forward
features = self.backbone(images.tensor)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/panoptic-deeplab/tools_d2/d2/backbone.py", line 136, in forward
y = super().forward(x)
File "/content/panoptic-deeplab/tools_d2/../segmentation/model/backbone/xception.py", line 190, in forward
x = self.bn1(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/batchnorm.py", line 519, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 625, in get_world_size
return _get_group_size(group)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
_check_default_pg()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 211, in _check_default_pg
"Default process group is not initialized"
AssertionError: Default process group is not initialized[ ]
I tried modifing modify in Base-PanopticDeepLab-OS16.yaml
IMS_PER_BATCH 32 -> IMS_PER_BATCH 8
Also in configs/panoptic_deeplab_R50_os32_cityscapes.yaml
TRAIN:
IMS_PER_BATCH: 1
GPUS: (0, )
can you please help with this issue.
I download Cityscapes dataset, and I do this two command
so, the dataset directory structure is like this
But when I run
the error is like this, I think maybe I should modify some .py to let them know the location of cityscapes, but I don't know modify which file, could you help me?