Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development
https://llama2-accessory.readthedocs.io/
Other
2.61k stars 167 forks source link

mixtral multi node #193

Open kumagai6 opened 2 months ago

kumagai6 commented 2 months ago

I was able to successfully train the Mixtral code on a single node with 8 GPUs by reducing the size, but when I switched to multiple nodes, I noticed that the loss per iteration does not decrease compared to the single node setup. Is there something wrong with the multi-node configuration?