Hi, I am using Habana® Deep Learning Base AMI tensorflow 2.9.1, aws ec2 instance to train Image segmentation model. Training time for same dataset using similar number of Hpus/Gpus and almost similar Hpu/Gpu memory(40GB vs 32GB) taking more training time compared to Nvidia A100. Key modification from Nvidia Gpu training script is using horovod, habana modules and MPI.
Below I have provided script link which also consist command to be run.
Hi, I am using Habana® Deep Learning Base AMI tensorflow 2.9.1, aws ec2 instance to train Image segmentation model. Training time for same dataset using similar number of Hpus/Gpus and almost similar Hpu/Gpu memory(40GB vs 32GB) taking more training time compared to Nvidia A100. Key modification from Nvidia Gpu training script is using horovod, habana modules and MPI.
Below I have provided script link which also consist command to be run.
Please let me know if there is any modifications, that I need to make in code to run faster to compare result with other vendors.
Thank you