ThunderVVV / RCLSTR

Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`
16 stars 1 forks source link

RuntimeError: No rendezvous handler for :// #4

Open ggxxer opened 7 months ago

ggxxer commented 7 months ago

When main_moco.py run to line 262 "mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))", it reports error like: "raise RuntimeError("No rendezvous handler for {}://".format(result.scheme)) RuntimeError: No rendezvous handler for ://" Can you give me any advice to solve this error?

ThunderVVV commented 7 months ago

The main_moco.py is using pytorch multi-gpu distributed training. Please verify that your cuda environment is set up correctly for distributed training.

ggxxer commented 6 months ago

The main_moco.py is using pytorch multi-gpu distributed training. Please verify that your cuda environment is set up correctly for distributed training.

Thank you for your reply. After setting up my cuda environment succesfully for distributed training, I've ran main_moco.py for nearly 12 hours. But model seems to fail to load TPS model successfully, since line 342: "print(f"TPS layer (freezed): {name}\n") ", fail to print anything. I have finished the steps according to your README.txt, can you give me any advice to solve this error?

ThunderVVV commented 6 months ago

The line 342 means that we load off-the-shelf pretrained TPS weights, and we freeze TPS during pre-training. You should download the TPS model weights(TRBA-Baseline-synth.pth) from baiduyun (password:px16) and put it as pretrain/TPS_model/TRBA-Baseline-synth.pth.

ggxxer commented 6 months ago

screenshot-1711350964465 I've finished this step according to README.txt, but this situation still exists.

ThunderVVV commented 6 months ago

Maybe you can check the key names in the TPS weights. It's actually a dictionary, and you can check to see if the names are correct during loading.