Hi, authors! Thanks for your great work! But I have a question about whether the values of norm weights, i.e., --w1 2e-4 --w2 5e-5 --w3 1e-4, are set under 4 GPUs or 8GPUs since they will be different after being divided by world_size, which is adopted in the file search.py. And I'm also curious about the reason for this division operation. I'd appreciate it greatly if you could give some explanations!
Hi, authors! Thanks for your great work! But I have a question about whether the values of norm weights, i.e., --w1 2e-4 --w2 5e-5 --w3 1e-4, are set under 4 GPUs or 8GPUs since they will be different after being divided by world_size, which is adopted in the file search.py. And I'm also curious about the reason for this division operation. I'd appreciate it greatly if you could give some explanations!