Closed Vincent-luo closed 11 months ago
Hi, thanks for interested in our work. You can safely remove the loop. The loop is designed for our problematic GPU cluster. It will automatically resume the training when error happens or the ddp port is occupied.
Thanks for your quick reply!
Hi, I'm reviewing the train_hico.sh script in your repository and have a question about the for loop
for i in 1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 9 ...
. I understand it runs the script multiple times with different master port numbers. Can you clarify the specific purpose of this? Also, if I only need to run the script once, can I safely remove the loop?