ROCm / tensorflow-upstream

TensorFlow ROCm port
https://tensorflow.org
Apache License 2.0
683 stars 93 forks source link

Skipping mgpu in sgpu #2501

Closed zstreet87 closed 2 months ago

zstreet87 commented 2 months ago

Skipping multi-gpu test in rocm.bazelrc file for single gpu runs - needed for Navi

i-chaochen commented 2 months ago

Retest Ubuntu-CPU please. Retest Ubuntu-GPU-multi please. Retest Ubuntu-GPU-single please.

i-chaochen commented 2 months ago

Retest Ubuntu-GPU-multi please. Retest Ubuntu-GPU-single please.

i-chaochen commented 2 months ago

Retest Ubuntu-GPU-multi please.

i-chaochen commented 2 months ago

Retest Ubuntu-GPU-multi please.

i-chaochen commented 2 months ago

Retest Ubuntu-GPU-multi please.

i-chaochen commented 2 months ago

Retest Ubuntu-GPU-multi please

i-chaochen commented 2 months ago

Retest Ubuntu-GPU-multi please

i-chaochen commented 2 months ago

Retest Ubuntu-GPU-multi please

i-chaochen commented 2 months ago

@zstreet87 this is always failed at //tensorflow/python/distribute:collective_all_reduce_strategy_test_xla_2gpu in CI, could you have a quick check on your local as well to see whether is ok?

zstreet87 commented 2 months ago

branch: r2.15-rocm-enhanced

command: tf-docker /tensorflow > bazel --bazelrc=/usertools/rocm.bazelrc test --config=rocm --run_under=//tensorflow/tools/ci_build/gpu_build:parallel_gpu_execute -- //tensorflow/python/distribute:collective_all_reduce_strategy_test_xla_2gpu

result: //tensorflow/python/distribute:collective_all_reduce_strategy_test_xla_2gpu PASSED in 54.0s

Is the CI using only 1 GPU?

i-chaochen commented 2 months ago

oh, now all tests are PASSED. You can merge now.