UX-Decoder / Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
Apache License 2.0
4.41k stars 408 forks source link

Multi-gpu training #125

Open Jinjun58 opened 9 months ago

Jinjun58 commented 9 months ago

Thanks for your great works. I refer to the documentation for multi-GPU training instructions, but only the first GPU seems to be used in my project. why?

CUDA_VISIBLE_DEVICES=1,2,3 mpirun -n 3 python entry.py train \
            --conf_files configs/seem/focall_unicl_lang_v1.yaml \
Beck-127 commented 6 months ago

I met the same problem. I have checked the log output, it seems like there are some problem with MPI. My command: CUDA_VISIBLE_DEVICES=0,1,2,3 mpirun -n 4 python entry.py train \ --conf_files configs/seem/focalt_unicl_lang_v1.yaml \ --overrides \ FP16 True \ COCO.INPUT.IMAGE_SIZE 1024 \ MODEL.DECODER.HIDDEN_DIM 512 \ MODEL.ENCODER.CONVS_DIM 512 \ MODEL.ENCODER.MASK_DIM 512 \ TEST.BATCH_SIZE_TOTAL 8 \ TRAIN.BATCH_SIZE_TOTAL 16 \ TRAIN.BATCH_SIZE_PER_GPU 2 \ SOLVER.MAX_NUM_EPOCHS 50 \ SOLVER.BASE_LR 0.0001 \ SOLVER.FIX_PARAM.backbone True \ SOLVER.FIX_PARAM.lang_encoder True \ SOLVER.FIX_PARAM.pixel_decoder True \ MODEL.DECODER.COST_SPATIAL.CLASS_WEIGHT 5.0 \ MODEL.DECODER.COST_SPATIAL.MASK_WEIGHT 2.0 \ MODEL.DECODER.COST_SPATIAL.DICE_WEIGHT 2.0 \ MODEL.DECODER.TOP_SPATIAL_LAYERS 10 \ MODEL.DECODER.SPATIAL.ENABLED True \ MODEL.DECODER.GROUNDING.ENABLED True \ FIND_UNUSED_PARAMETERS True \ ATTENTION_ARCH.SPATIAL_MEMORIES 32 \ MODEL.DECODER.SPATIAL.MAX_ITER 5 \ ATTENTION_ARCH.QUERY_NUMBER 3 \ STROKE_SAMPLER.MAX_CANDIDATE 10 \ WEIGHT True \ RESUME_FROM ./xdecoder_data/pretrained/xdecoder_focalt_last.pt ERROR LOG: image image Waiting for response!THX!