SysCV / sam-hq

Segment Anything in High Quality [NeurIPS 2023]
https://arxiv.org/abs/2306.01567
Apache License 2.0
3.65k stars 219 forks source link

When fine-tuning one's own dataset, an error was reported as follows #52

Open xiyangyang99 opened 1 year ago

xiyangyang99 commented 1 year ago

Traceback (most recent call last): File "train.py", line 694, in main(net, train_datasets, valid_datasets, args) File "train.py", line 327, in main train_dataloaders, train_datasets = create_dataloaders(train_im_gt_list, File "/home/quchunguang/datasets/sam-hq/train/utils/dataloader.py", line 71, in create_dataloaders sampler = DistributedSampler(gos_dataset) File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/utils/data/distributed.py", line 65, in init num_replicas = dist.get_world_size() File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size return _get_group_size(group) File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size default_pg = _get_default_group() File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 410, in _get_default_group raise RuntimeError( RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

How to solve it?

lkeab commented 1 year ago

hi, can you provide your detailed training command?

xpzwzwz commented 9 months ago

hello,do you solve it?

LTayfaker commented 8 months ago

I meet this problem too

geekplusaa commented 8 months ago

I meet this problem too


python train.py \ --checkpoint ./pretrained_checkpoint/sam_vit_b_01ec64.pth \ --model-type vit_b \ --output .

` --- create training dataloader --- ------------------------------ train -------------------------------- --->>> train dataset 0 / 7 DIS5K-TR <<<--- -im- DIS5K-TR ./data/DIS5K/DIS-TR/im : 3000 -gt- DIS5K-TR ./data/DIS5K/DIS-TR/gt : 3000 --->>> train dataset 1 / 7 ThinObject5k-TR <<<--- -im- ThinObject5k-TR ./data/thin_object_detection/ThinObject5K/images_train : 4748 -gt- ThinObject5k-TR ./data/thin_object_detection/ThinObject5K/masks_train : 4748 --->>> train dataset 2 / 7 FSS <<<--- -im- FSS ./data/cascade_psp/fss_all : 10000 -gt- FSS ./data/cascade_psp/fss_all : 10000 --->>> train dataset 3 / 7 DUTS-TR <<<--- -im- DUTS-TR ./data/cascade_psp/DUTS-TR : 10553 -gt- DUTS-TR ./data/cascade_psp/DUTS-TR : 10553 --->>> train dataset 4 / 7 DUTS-TE <<<--- -im- DUTS-TE ./data/cascade_psp/DUTS-TE : 5019 -gt- DUTS-TE ./data/cascade_psp/DUTS-TE : 5019 --->>> train dataset 5 / 7 ECSSD <<<--- -im- ECSSD ./data/cascade_psp/ecssd : 1000 -gt- ECSSD ./data/cascade_psp/ecssd : 1000 --->>> train dataset 6 / 7 MSRA10K <<<--- -im- MSRA10K ./data/cascade_psp/MSRA_10K : 10000 -gt- MSRA10K ./data/cascade_psp/MSRA_10K : 10000 Traceback (most recent call last): File "/home/geekplusa/ai/projects/shkj/projects/sam-hq/train/train.py", line 694, in main(net, train_datasets, valid_datasets, args) File "/home/geekplusa/ai/projects/shkj/projects/sam-hq/train/train.py", line 327, in main train_dataloaders, train_datasets = create_dataloaders(train_im_gt_list, File "/home/geekplusa/ai/projects/shkj/projects/sam-hq/train/utils/dataloader.py", line 71, in create_dataloaders sampler = DistributedSampler(gos_dataset) File "/home/geekplusa/miniconda3/envs/ai-cv-segment/lib/python3.9/site-packages/torch/utils/data/distributed.py", line 68, in init num_replicas = dist.get_world_size() File "/home/geekplusa/miniconda3/envs/ai-cv-segment/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1196, in get_world_size return _get_group_size(group) File "/home/geekplusa/miniconda3/envs/ai-cv-segment/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 576, in _get_group_size default_pg = _get_default_group() File "/home/geekplusa/miniconda3/envs/ai-cv-segment/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 707, in _get_default_group raise RuntimeError( RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

`