beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Apache License 2.0

620 stars 30 forks source link

local_rank = setup_distributed() #15

Open lq-blackcat opened 5 months ago

lq-blackcat commented 5 months ago

How long does distributed training initialization take? dist.init_process_group( backend=backend, world_size=world_size, rank=rank, )

beichenzbc commented 5 months ago

Very quick. If you stuck in this process, usually there's a mistake in your script.

lq-blackcat commented 5 months ago

!/bin/bash

SBATCH --job-name=long-clip

SBATCH --nodes=1

SBATCH --ntasks=32

SBATCH --gres=gpu:1

SBATCH --time=96:00:00

SBATCH --comment pris718bobo

source ~/.bashrc

export CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py

What needs to be modified? Could you please provide some help. @beichenzbc

gulizhoutao commented 4 months ago

!/bin/bash #SBATCH --job-name=long-clip #SBATCH --nodes=1 #SBATCH --ntasks=32 #SBATCH --gres=gpu:1 #SBATCH --time=96:00:00 #SBATCH --comment pris718bobo

source ~/.bashrc

export CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 train.py

What needs to be modified? Could you please provide some help. @beichenzbc

Do you resolve this problem? I also get into the same case.