Open xiyangyang99 opened 1 year ago
hi, can you provide your detailed training command?
hello,do you solve it?
I meet this problem too
I meet this problem too
python train.py \ --checkpoint ./pretrained_checkpoint/sam_vit_b_01ec64.pth \ --model-type vit_b \ --output .
`
--- create training dataloader ---
------------------------------ train --------------------------------
--->>> train dataset 0 / 7 DIS5K-TR <<<---
-im- DIS5K-TR ./data/DIS5K/DIS-TR/im : 3000
-gt- DIS5K-TR ./data/DIS5K/DIS-TR/gt : 3000
--->>> train dataset 1 / 7 ThinObject5k-TR <<<---
-im- ThinObject5k-TR ./data/thin_object_detection/ThinObject5K/images_train : 4748
-gt- ThinObject5k-TR ./data/thin_object_detection/ThinObject5K/masks_train : 4748
--->>> train dataset 2 / 7 FSS <<<---
-im- FSS ./data/cascade_psp/fss_all : 10000
-gt- FSS ./data/cascade_psp/fss_all : 10000
--->>> train dataset 3 / 7 DUTS-TR <<<---
-im- DUTS-TR ./data/cascade_psp/DUTS-TR : 10553
-gt- DUTS-TR ./data/cascade_psp/DUTS-TR : 10553
--->>> train dataset 4 / 7 DUTS-TE <<<---
-im- DUTS-TE ./data/cascade_psp/DUTS-TE : 5019
-gt- DUTS-TE ./data/cascade_psp/DUTS-TE : 5019
--->>> train dataset 5 / 7 ECSSD <<<---
-im- ECSSD ./data/cascade_psp/ecssd : 1000
-gt- ECSSD ./data/cascade_psp/ecssd : 1000
--->>> train dataset 6 / 7 MSRA10K <<<---
-im- MSRA10K ./data/cascade_psp/MSRA_10K : 10000
-gt- MSRA10K ./data/cascade_psp/MSRA_10K : 10000
Traceback (most recent call last):
File "/home/geekplusa/ai/projects/shkj/projects/sam-hq/train/train.py", line 694, in
`
Traceback (most recent call last): File "train.py", line 694, in
main(net, train_datasets, valid_datasets, args)
File "train.py", line 327, in main
train_dataloaders, train_datasets = create_dataloaders(train_im_gt_list,
File "/home/quchunguang/datasets/sam-hq/train/utils/dataloader.py", line 71, in create_dataloaders
sampler = DistributedSampler(gos_dataset)
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/utils/data/distributed.py", line 65, in init
num_replicas = dist.get_world_size()
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size
return _get_group_size(group)
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size
default_pg = _get_default_group()
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 410, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
How to solve it?