aws-samples / aws-deep-learning-ami-ubuntu-dcv-desktop

MIT No Attribution
15 stars 2 forks source link

Trainium clusters need to be in a single subnet for EFA collective communications #25

Open ajayvohra2005 opened 8 months ago

ajayvohra2005 commented 8 months ago

Trainium clusters must be created in a single subnet (which means single AZ), other wise the instances can not do collective communications over EFA.