aws / aws-k8s-tester

AWS Kubernetes tester, kubetest2 deployer implementation
Apache License 2.0
163 stars 82 forks source link

Add bert e2e test for neuron device #466

Closed weicongw closed 2 months ago

weicongw commented 2 months ago

Issue #, if available:

Description of changes:

Test logs

=== RUN   TestNeuron
=== RUN   TestNeuron/single-node-unit-test
=== RUN   TestNeuron/single-node-unit-test/Single_node_unit_test_succeeds
=== NAME  TestNeuron/single-node-unit-test
    neuron_test.go:88: Test log for neuron-unit-test:
    neuron_test.go:89: [2024-08-06 21:18:58,464] torch.distributed.run: [WARNING] 
        [2024-08-06 21:18:58,464] torch.distributed.run: [WARNING] *****************************************
        [2024-08-06 21:18:58,464] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
        [2024-08-06 21:18:58,464] torch.distributed.run: [WARNING] *****************************************
        Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
        Failed to download (trying next):
        HTTP Error 403: Forbidden

        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./MNIST_DATA_train/0/MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:00<00:00, 148595067.92it/s]
        Extracting ./MNIST_DATA_train/0/MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST_DATA_train/0/MNIST/raw

        Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
        Failed to download (trying next):
        HTTP Error 403: Forbidden

        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./MNIST_DATA_train/0/MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 53576158.26it/s]
        Extracting ./MNIST_DATA_train/0/MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST_DATA_train/0/MNIST/raw

        Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
        Failed to download (trying next):
        HTTP Error 403: Forbidden

        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./MNIST_DATA_train/0/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:00<00:00, 82001107.40it/s]
        Extracting ./MNIST_DATA_train/0/MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST_DATA_train/0/MNIST/raw

        Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
        Failed to download (trying next):
        HTTP Error 403: Forbidden

        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST_DATA_train/0/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 38101057.54it/s]
        Extracting ./MNIST_DATA_train/0/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST_DATA_train/0/MNIST/raw

        ----------Training ---------------
        2024-08-06 21:19:10.000462:  72  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:19:10.000466:  72  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/76c1e4df-3ebe-442d-a830-f311bfa526be/model.MODULE_11065306676681237697+d41d8cd9.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/76c1e4df-3ebe-442d-a830-f311bfa526be/model.MODULE_11065306676681237697+d41d8cd9.neff --verbose=35
        .
        Compiler status PASS
        2024-Aug-06 21:19:13.0249 72:89 [0] init.cc:108 CCOM WARN Linux kernel 5.10 requires setting FI_EFA_FORK_SAFE=1 environment variable.  Multi-node support will be disabled.
        Please restart with FI_EFA_FORK_SAFE=1 set.
        Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
        Failed to download (trying next):
        HTTP Error 403: Forbidden

        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ./MNIST_DATA_train/1/MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:00<00:00, 182540958.48it/s]
        Extracting ./MNIST_DATA_train/1/MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST_DATA_train/1/MNIST/raw

        Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
        Failed to download (trying next):
        HTTP Error 403: Forbidden

        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ./MNIST_DATA_train/1/MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 23677813.49it/s]
        Extracting ./MNIST_DATA_train/1/MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST_DATA_train/1/MNIST/raw

        Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
        Failed to download (trying next):
        HTTP Error 403: Forbidden

        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ./MNIST_DATA_train/1/MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:00<00:00, 103698965.34it/s]
        Extracting ./MNIST_DATA_train/1/MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST_DATA_train/1/MNIST/raw

        Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
        Failed to download (trying next):
        HTTP Error 403: Forbidden

        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
        Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST_DATA_train/1/MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 40446982.52it/s]
        Extracting ./MNIST_DATA_train/1/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST_DATA_train/1/MNIST/raw

        ----------Training ---------------
        2024-08-06 21:19:18.000665:  73  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:19:18.000667:  73  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_11065306676681237697+d41d8cd9/model.neff. Exiting with a successfully compiled graph.
        2024-Aug-06 21:19:18.0736 73:362 [1] init.cc:108 CCOM WARN Linux kernel 5.10 requires setting FI_EFA_FORK_SAFE=1 environment variable.  Multi-node support will be disabled.
        Please restart with FI_EFA_FORK_SAFE=1 set.
        2024-08-06 21:19:22.000390:  73  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:19:22.000391:  72  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:19:22.000391:  72  INFO ||NEURON_CC_WRAPPER||: Another process must be compiling /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4493708196142688607+d41d8cd9/model.hlo_module.pb, been waiting for: 0.0 minutes
        2024-08-06 21:19:22.000391:  73  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/0513b8c8-a899-4380-a0ae-9eda19fec4af/model.MODULE_4493708196142688607+d41d8cd9.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/0513b8c8-a899-4380-a0ae-9eda19fec4af/model.MODULE_4493708196142688607+d41d8cd9.neff --verbose=35
        .
        Compiler status PASS
        2024-08-06 21:19:27.000395:  72  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4493708196142688607+d41d8cd9/model.neff. Exiting with a successfully compiled graph.
        Train throughput (iter/sec): 268.11805010983363Train throughput (iter/sec): 268.1022315929829

        Final loss is 0.3240
        Final loss is 0.0942
        ----------End Training ---------------
        ----------End Training ---------------
        [2024-08-06 21:19:45,051] torch.distributed.run: [WARNING] 
        [2024-08-06 21:19:45,051] torch.distributed.run: [WARNING] *****************************************
        [2024-08-06 21:19:45,051] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
        [2024-08-06 21:19:45,051] torch.distributed.run: [WARNING] *****************************************
        2024-08-06 21:19:56.000088:  709  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:19:56.000089:  709  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/bd21c6a6-b910-4078-9506-81d4f007028a/model.MODULE_17891116665549384984+d41d8cd9.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/bd21c6a6-b910-4078-9506-81d4f007028a/model.MODULE_17891116665549384984+d41d8cd9.neff --verbose=35
        .
        Compiler status PASS
        2024-Aug-06 21:19:57.0513 709:986 [1] init.cc:108 CCOM WARN Linux kernel 5.10 requires setting FI_EFA_FORK_SAFE=1 environment variable.  Multi-node support will be disabled.
        Please restart with FI_EFA_FORK_SAFE=1 set.
        testing initialize_model_parallel with size 1
        > initializing tensor model parallel with size 1
        > initializing pipeline model parallel with size 1
        > initializing data parallel with size 2
        2024-08-06 21:20:04.000280:  708  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:04.000281:  708  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_17891116665549384984+d41d8cd9/model.neff. Exiting with a successfully compiled graph.
        2024-Aug-06 21:20:04.0347 708:1115 [0] init.cc:108 CCOM WARN Linux kernel 5.10 requires setting FI_EFA_FORK_SAFE=1 environment variable.  Multi-node support will be disabled.
        Please restart with FI_EFA_FORK_SAFE=1 set.
        test passed
        testing get_tensor_model_parallel_src_rank with size 1
        > initializing tensor model parallel with size 1
        > initializing pipeline model parallel with size 1
        > initializing data parallel with size 2
        test passed
        testing initialize_model_parallel with size 2
        > initializing tensor model parallel with size 2
        > initializing pipeline model parallel with size 1
        > initializing data parallel with size 1
        test passed
        testing get_tensor_model_parallel_src_rank with size 2
        > initializing tensor model parallel with size 2
        > initializing pipeline model parallel with size 1
        > initializing data parallel with size 1
        test passed
        [2024-08-06 21:20:06,553] torch.distributed.run: [WARNING] 
        [2024-08-06 21:20:06,553] torch.distributed.run: [WARNING] *****************************************
        [2024-08-06 21:20:06,553] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
        [2024-08-06 21:20:06,553] torch.distributed.run: [WARNING] *****************************************
        running all reduce
        at iteration 0, with local rank 0
        2024-08-06 21:20:17.000589:  1184  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:17.000590:  1184  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/df111723-1804-439d-8eaa-0a67d9f6a2d1/model.MODULE_107266017504238116+d41d8cd9.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/df111723-1804-439d-8eaa-0a67d9f6a2d1/model.MODULE_107266017504238116+d41d8cd9.neff --verbose=35
        .
        Compiler status PASS
        2024-Aug-06 21:20:19.0017 1184:1461 [0] init.cc:108 CCOM WARN Linux kernel 5.10 requires setting FI_EFA_FORK_SAFE=1 environment variable.  Multi-node support will be disabled.
        Please restart with FI_EFA_FORK_SAFE=1 set.
        running all reduce
        at iteration 0, with local rank 1
        2024-08-06 21:20:25.000781:  1185  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:25.000781:  1185  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_107266017504238116+d41d8cd9/model.neff. Exiting with a successfully compiled graph.
        2024-Aug-06 21:20:25.0845 1185:1590 [1] init.cc:108 CCOM WARN Linux kernel 5.10 requires setting FI_EFA_FORK_SAFE=1 environment variable.  Multi-node support will be disabled.
        Please restart with FI_EFA_FORK_SAFE=1 set.
        tensor([[2., 2., 2.],
                [2., 2., 2.]])tensor([[2., 2., 2.],
                [2., 2., 2.]])

        at iteration 1, with local rank 1at iteration 1, with local rank 0

        2024-08-06 21:20:25.000886:  1185  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:25.000886:  1184  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:25.000886:  1185  INFO ||NEURON_CC_WRAPPER||: Another process must be compiling /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_6503008142756280942+d41d8cd9/model.hlo_module.pb, been waiting for: 0.0 minutes
        2024-08-06 21:20:25.000887:  1184  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/999cb975-989b-4d26-9687-40a8089fddfb/model.MODULE_6503008142756280942+d41d8cd9.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/999cb975-989b-4d26-9687-40a8089fddfb/model.MODULE_6503008142756280942+d41d8cd9.neff --verbose=35
        .
        Compiler status PASS
        2024-08-06 21:20:30.000892:  1185  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_6503008142756280942+d41d8cd9/model.neff. Exiting with a successfully compiled graph.
        tensor([[2., 2., 2.],
                [2., 2., 2.]])tensor([[2., 2., 2.],
                [2., 2., 2.]])

        at iteration 2, with local rank 1at iteration 2, with local rank 0

        2024-08-06 21:20:30.000975:  1185  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:30.000975:  1184  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:30.000977:  1184  INFO ||NEURON_CC_WRAPPER||: Another process must be compiling /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_1553125877955292452+d41d8cd9/model.hlo_module.pb, been waiting for: 0.0 minutes
        2024-08-06 21:20:30.000977:  1185  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/0f060dde-5778-4128-aa4e-e87e7ac182e9/model.MODULE_1553125877955292452+d41d8cd9.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/0f060dde-5778-4128-aa4e-e87e7ac182e9/model.MODULE_1553125877955292452+d41d8cd9.neff --verbose=35
        .
        Compiler status PASS
        2024-08-06 21:20:35.000983:  1184  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_1553125877955292452+d41d8cd9/model.neff. Exiting with a successfully compiled graph.
        tensor([[2., 2., 2.],
                [2., 2., 2.]])tensor([[2., 2., 2.],
                [2., 2., 2.]])

        at iteration 3, with local rank 0at iteration 3, with local rank 1

        2024-08-06 21:20:36.000065:  1184  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:36.000065:  1185  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:36.000067:  1185  INFO ||NEURON_CC_WRAPPER||: Another process must be compiling /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_14849917709285030391+d41d8cd9/model.hlo_module.pb, been waiting for: 0.0 minutes
        2024-08-06 21:20:36.000067:  1184  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/b5071c3a-6480-439f-9f6d-ade3d1afbb52/model.MODULE_14849917709285030391+d41d8cd9.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/b5071c3a-6480-439f-9f6d-ade3d1afbb52/model.MODULE_14849917709285030391+d41d8cd9.neff --verbose=35
        .
        Compiler status PASS
        2024-08-06 21:20:41.000072:  1185  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_14849917709285030391+d41d8cd9/model.neff. Exiting with a successfully compiled graph.
        tensor([[2., 2., 2.],
                [2., 2., 2.]])tensor([[2., 2., 2.],
                [2., 2., 2.]])

        at iteration 4, with local rank 0at iteration 4, with local rank 1

        2024-08-06 21:20:41.000156:  1184  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:41.000156:  1185  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
        2024-08-06 21:20:41.000158:  1185  INFO ||NEURON_CC_WRAPPER||: Another process must be compiling /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_16080223531902998505+d41d8cd9/model.hlo_module.pb, been waiting for: 0.0 minutes
        2024-08-06 21:20:41.000158:  1184  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/14d4ce06-1655-44a3-9b45-0df4b4aacfc3/model.MODULE_16080223531902998505+d41d8cd9.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/14d4ce06-1655-44a3-9b45-0df4b4aacfc3/model.MODULE_16080223531902998505+d41d8cd9.neff --verbose=35
        .
        Compiler status PASS
        2024-08-06 21:20:46.000164:  1185  INFO ||NEURON_CC_WRAPPER||: Using a cached neff at /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_16080223531902998505+d41d8cd9/model.neff. Exiting with a successfully compiled graph.
        tensor([[2., 2., 2.],
                [2., 2., 2.]])tensor([[2., 2., 2.],
                [2., 2., 2.]])

        PASSPASS
=== RUN   TestNeuron/single-node-inference
=== RUN   TestNeuron/single-node-inference/Single_node_bert_inference_Job_succeeds
=== NAME  TestNeuron/single-node-inference
        ...
        Compiler status PASS
        Inference Mode: throughput
        Average time per batch: 0.0420 seconds
        Throughput: 190.64 samples/second
...
       Process 28 - Throughput: 28.93 samples/second
        Process 11 - Throughput: 28.93 samples/second----------End Training ---------------

        ----------End Training ---------------
        Process 24 - Training time: 3.46 seconds
        Process 24 - Throughput: 28.92 samples/second
        ----------End Training ---------------
        Process 40 - Training time: 3.46 seconds
        Process 40 - Throughput: 28.92 samples/second
        ----------End Training ---------------
        Process 56 - Training time: 3.46 seconds
        Process 56 - Throughput: 28.92 samples/second
        ----------End Training ---------------
        Process 9 - Training time: 3.46 seconds
        Process 39 - Training time: 3.46 seconds
        Process 39 - Throughput: 28.92 samples/second
        ----------End Training ---------------
        Process 9 - Throughput: 28.91 samples/second
        ----------End Training ---------------
        Process 7 - Training time: 3.46 seconds
        Process 7 - Throughput: 28.91 samples/second
        ----------End Training ---------------
        Process 51 - Training time: 3.46 seconds
        Process 51 - Throughput: 28.92 samples/second
        ----------End Training ---------------
        Process 33 - Training time: 3.46 seconds
        Process 33 - Throughput: 28.91 samples/second
        ----------End Training ---------------

--- PASS: TestNeuron (989.06s)
    --- PASS: TestNeuron/single-node-unit-test (120.94s)
        --- PASS: TestNeuron/single-node-unit-test/Single_node_unit_test_succeeds (120.07s)
    --- PASS: TestNeuron/single-node-inference (80.88s)
        --- PASS: TestNeuron/single-node-inference/Single_node_bert_inference_Job_succeeds (80.08s)
    --- PASS: TestNeuron/multi-node-training (787.24s)
        --- PASS: TestNeuron/multi-node-training/Multi_node_bert_training_MPIJob_succeeds (785.08s)
PASS
ok      github.com/aws/aws-k8s-tester/e2e2/test/cases/neuron    1015.860s

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.