facebookresearch / AVID-CMA

Audio Visual Instance Discrimination with Cross-Modal Agreement
Other
127 stars 18 forks source link

What is the running command "node k >>" mean? #11

Closed fake-warrior8 closed 2 years ago

fake-warrior8 commented 2 years ago

Hi, Could you tell me what "node0>>" and "node1>>" mean in the following run commands

node0>> python main-avid.py configs/main/avid/kinetics/Cross-N1024.yaml --dist-url tcp://{NODE0-IP}:1234 --multiprocessing-distributed --world-size 4 --rank 0
node1>> python main-avid.py configs/main/avid/kinetics/Cross-N1024.yaml --dist-url tcp://{NODE0-IP}:1234 --multiprocessing-distributed --world-size 4 --rank 1
node2>> python main-avid.py configs/main/avid/kinetics/Cross-N1024.yaml --dist-url tcp://{NODE0-IP}:1234 --multiprocessing-distributed --world-size 4 --rank 2
node3>> python main-avid.py configs/main/avid/kinetics/Cross-N1024.yaml --dist-url tcp://{NODE0-IP}:1234 --multiprocessing-distributed --world-size 4 --rank 3

. I can't find the introduction of "node0>>" in Google or in the link you gave documentation. This documentation only gives an example of

Node 1: (IP: 192.168.1.1, and has a free port: 1234)

>>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
           --nnodes=2 --node_rank=0 --master_addr="192.168.1.1"
           --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
           and all other arguments of your training script)
Node 2:

>>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
           --nnodes=2 --node_rank=1 --master_addr="192.168.1.1"
           --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
           and all other arguments of your training script)

So can I just remove the "node k >>"?

pedro-morgado commented 2 years ago

nodes are machines. most machines only have up to 4 or 8 gpus. if you want to train on more than that, you'll have to use multiple machines, in which case, you need to specify the IP address of the master node (in this case, node0).

if you want to run on a single machine (eg 4 or 8 gpus), then yes, you can simply run:

>> python main-avid.py configs/main/avid/kinetics/Cross-N1024.yaml --dist-url tcp://localhost:1234 --multiprocessing-distributed --world-size 1 --rank 0
fake-warrior8 commented 2 years ago

Thank you!