This test being added will run an E2E BERT training test. The validation for this test was done on a cluster consisting of p3.16xlarge instance type. The cluster has four nodes in total.
The results of running the training test can be seen below. These logs were obtained from the master pod that coordinated the E2E BERT training job.
Issue #, if available:
Description of changes:
This test being added will run an E2E BERT training test. The validation for this test was done on a cluster consisting of p3.16xlarge instance type. The cluster has four nodes in total.
The results of running the training test can be seen below. These logs were obtained from the master pod that coordinated the E2E BERT training job.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.