Open karan842 opened 1 month ago
Hello!
I have 2 comments:
Using GPU: 4
is to be expected as there are still 4 GPUs in total. It's equivalent to os.environ["WORLD_SIZE"]
, i.e. how many GPUs there are in total. You're likely interested in LOCAL_RANK
instead, which will be 0, 1, 2, and 3 for each of the 4 processes respectively. (https://pytorch.org/docs/stable/elastic/run.html#definitions)Accelerator
yourself, that will be done for you automatically when you train.And to my knowledge it's normal to load all data on all processes, then the Accelerator in the SentenceTransformerTrainer
that gets made internally will tell each of the processes which batches from the full dataset to actually use.
I am working on sentence-transforner V3 and using multi-gpu, using this command:
bash accelerate launch --multi-gpu --num_processes=4 main.py
Here is the code:
Code is working fine and enabling multi-gpu.
I have a doubt that when I ran the code it printed "Using GPU: 4" four times and even data loading 4 times.
Does that mean the same model and whole data has been loaded into all GPUs simultaneously? What is the point of loading everything in all 4 GPUs? if I just want training to be done in GPUs parallely.
Please Explain.