HRNet / HigherHRNet-Human-Pose-Estimation

This is an official implementation of our CVPR 2020 paper "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation" (https://arxiv.org/abs/1908.10357)
MIT License
1.35k stars 272 forks source link

TRAIN IN WINDOWS :MAYBE PYTORCH PROBLEM? HOW CAN I DO?RuntimeError: No rendezvous handler for tcp:// #77

Closed happydog555 closed 3 years ago

happydog555 commented 3 years ago

=> creating output\coco_kpt\pose_higher_hrnet\w32_512_adam_lr1e-3 => creating log\coco_kpt\pose_higher_hrnet\w32_512_adam_lr1e-3_2020-12-23-15-56 Namespace(cfg='experiments/coco/higher_hrnet/w32_512_adam_lr1e-3.yaml', dist_url='tcp://127.0.0.1:23456', gpu=None, opts=['FP16.ENABLED', 'True' , 'FP16.DYNAMIC_LOSS_SCALE', 'True', 'MODEL.SYNC_BN', 'True'], rank=0, world_size=1) AUTO_RESUME: True CUDNN: BENCHMARK: True DETERMINISTIC: False ENABLED: True DATASET: BASE_SIGMA: 2.0 BASE_SIZE: 256.0 DATASET: coco_kpt DATASET_TEST: coco DATA_FORMAT: jpg FLIP: 0.5 INPUT_SIZE: 512 INT_SIGMA: False MAX_NUM_PEOPLE: 30 MAX_ROTATION: 30 MAX_SCALE: 1.5 MAX_TRANSLATE: 40 MIN_SCALE: 0.75 NUM_JOINTS: 17 OUTPUT_SIZE: [128, 256] ROOT: data/coco SCALE_AWARE_SIGMA: False SCALE_TYPE: short SIGMA: 2 TEST: val2017 TRAIN: train2017 WITH_CENTER: False DATA_DIR: DEBUG: DEBUG: True SAVE_BATCH_IMAGES_GT: False SAVE_BATCH_IMAGES_PRED: False SAVE_HEATMAPS_GT: True SAVE_HEATMAPS_PRED: True SAVE_TAGMAPS_PRED: True DIST_BACKEND: nccl FP16: DYNAMIC_LOSS_SCALE: True ENABLED: True STATIC_LOSS_SCALE: 1.0 GPUS: (0,) LOG_DIR: log LOSS: AE_LOSS_TYPE: exp HEATMAPS_LOSS_FACTOR: (1.0, 1.0) NUM_STAGES: 2 PULL_LOSS_FACTOR: (0.001, 0.001) PUSH_LOSS_FACTOR: (0.001, 0.001) WITH_AE_LOSS: (True, False) WITH_HEATMAPS_LOSS: (True, True) MODEL: EXTRA: DECONV: CAT_OUTPUT: [True] KERNEL_SIZE: [4] NUM_BASIC_BLOCKS: 4 NUM_CHANNELS: [32] NUM_DECONVS: 1 FINAL_CONV_KERNEL: 1 PRETRAINED_LAYERS: ['*'] STAGE2: BLOCK: BASIC FUSE_METHOD: SUM NUM_BLOCKS: [4, 4] NUM_BRANCHES: 2 NUM_CHANNELS: [32, 64] NUM_MODULES: 1 STAGE3: BLOCK: BASIC FUSE_METHOD: SUM NUM_BLOCKS: [4, 4, 4] NUM_BRANCHES: 3 NUM_CHANNELS: [32, 64, 128] NUM_MODULES: 4 STAGE4: BLOCK: BASIC FUSE_METHOD: SUM NUM_BLOCKS: [4, 4, 4, 4] NUM_BRANCHES: 4 NUM_CHANNELS: [32, 64, 128, 256] NUM_MODULES: 3 STEM_INPLANES: 64 INIT_WEIGHTS: True NAME: pose_higher_hrnet NUM_JOINTS: 17 PRETRAINED: models/pytorch/imagenet/hrnet_w32-36af842e.pth SYNC_BN: True TAG_PER_JOINT: True MULTIPROCESSING_DISTRIBUTED: True OUTPUT_DIR: output PIN_MEMORY: True PRINT_FREQ: 100 RANK: 0 TEST: ADJUST: True DETECTION_THRESHOLD: 0.1 FLIP_TEST: True IGNORE_CENTER: True IGNORE_TOO_MUCH: False IMAGES_PER_GPU: 1 LOG_PROGRESS: False MODEL_FILE: NMS_KERNEL: 5 NMS_PADDING: 2 PROJECT2IMAGE: True REFINE: True SCALE_FACTOR: [1] TAG_THRESHOLD: 1.0 USE_DETECTION_VAL: True WITH_AE: (True, False) WITH_HEATMAPS: (True, True) TRAIN: BEGIN_EPOCH: 0 CHECKPOINT: END_EPOCH: 300 GAMMA1: 0.99 GAMMA2: 0.0 IMAGES_PER_GPU: 12 LR: 0.001 LR_FACTOR: 0.1 LR_STEP: [200, 260] MOMENTUM: 0.9 NESTEROV: False OPTIMIZER: adam RESUME: False SHUFFLE: True WD: 0.0001 VERBOSE: True WORKERS: 4 Use GPU: cuda for training Init process group: dist_url: tcp://127.0.0.1:23456, world_size: 1, rank: 0 Traceback (most recent call last): File "tools/dist_train.py", line 319, in main() File "tools/dist_train.py", line 114, in main args=(ngpus_per_node, args, final_output_dir, tb_log_dir) File "E:\anaconda\lib\site-packages\torch\multiprocessing\spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "E:\anaconda\lib\site-packages\torch\multiprocessing\spawn.py", line 157, in start_processes while not context.join(): File "E:\anaconda\lib\site-packages\torch\multiprocessing\spawn.py", line 118, in join raise Exception(msg) Exception:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "E:\anaconda\lib\site-packages\torch\multiprocessing\spawn.py", line 19, in _wrap fn(i, *args) File "G:\BaiduNetdiskDownload\HigherHRNet-Human-Pose-Estimation-master\HigherHRNet-Human-Pose-Estimation-master\tools\dist_train.py", line 160 , in main_worker rank=args.rank File "E:\anaconda\lib\site-packages\torch\distributed\distributed_c10d.py", line 434, in init_process_group init_method, rank, world_size, timeout=timeout File "E:\anaconda\lib\site-packages\torch\distributed\rendezvous.py", line 82, in rendezvous raise RuntimeError("No rendezvous handler for {}://".format(result.scheme)) RuntimeError: No rendezvous handler for tcp://

HyunJai commented 3 years ago

Me Too

Aliiiu commented 3 years ago

Pls were you able to open and see the code for the pretrained model

happydog555 commented 3 years ago

Me Too

_C.MULTIPROCESSING_DISTRIBUTED = False

happydog555 commented 3 years ago

i successed.Do you need help?

------------------ 原始邮件 ------------------ 发件人: "Hyun Jae Bae"<notifications@github.com>; 发送时间: 2020年12月24日(星期四) 上午10:21 收件人: "HRNet/HigherHRNet-Human-Pose-Estimation"<HigherHRNet-Human-Pose-Estimation@noreply.github.com>; 抄送: "15045559"<15045559@qq.com>; "Author"<author@noreply.github.com>; 主题: Re: [HRNet/HigherHRNet-Human-Pose-Estimation] TRAIN IN WINDOWS :MAYBE PYTORCH PROBLEM? HOW CAN I DO?RuntimeError: No rendezvous handler for tcp:// (#77)

Me Too

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

unbeliveyu commented 2 years ago

yes,Can you tell me what you did to successed?

wusaisa commented 1 year ago

I had the same problem, can you help me?