ROCm / ROCm-docker

Dockerfiles for the various software layers defined in the ROCm software platform
MIT License
432 stars 65 forks source link

[Issue]: pytorch unit tests never finish #124

Open baryluk opened 8 months ago

baryluk commented 8 months ago

Problem Description

root@002d42c15b02:/var/lib/jenkins# python3 -c 'import torch; print(torch.cuda.is_available())'
True
root@002d42c15b02:/var/lib/jenkins# 
root@002d42c15b02:/var/lib/jenkins/pytorch# PYTORCH_TEST_WITH_ROCM=1 python3 test/run_test.py --verbose \
> --include test_nn test_torch test_cuda test_ops \
> test_unary_ufuncs test_binary_ufuncs test_autograd
/var/lib/jenkins/pytorch/test/run_test.py:18: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
Ignoring disabled issues:  ['']
Downloading https://ossci-metrics.s3.amazonaws.com/slow-tests.json to /var/lib/jenkins/pytorch/test/.pytorch-slow-tests.json
Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/pytorch/test/.pytorch-disabled-tests.json
Received 7 tests to prioritize
  test_nn
  test_torch
  test_cuda
  test_ops
  test_unary_ufuncs
  test_binary_ufuncs
  test_autograd
/var/lib/jenkins/pytorch/tools/testing/target_determination/heuristics/previously_failed_in_pr.py:34: UserWarning: No pytorch cache found at /var/lib/jenkins/pytorch/.pytest_cache/v/cache/lastfailed
  warn(
Heuristic PreviouslyFailedInPR identified 0 tests to prioritize (0.00%%)
Heuristic EditedByPR identified 3 tests to prioritize (42.86%%)
High Relevance tests (3):
  test_nn
  test_ops
  test_torch
Unranked Relevance tests (4):
  test_autograd
  test_binary_ufuncs
  test_cuda
  test_unary_ufuncs
Heuristic CorrelatedWithHistoricalFailures identified 7 tests to prioritize (100.00%%)
Probable Relevance tests (7):
  test_binary_ufuncs
  test_autograd
  test_unary_ufuncs
  test_cuda
  test_nn
  test_torch
  test_ops
High Relevance tests (3):
  test_nn
  test_ops
  test_torch
Probable Relevance tests (4):
  test_binary_ufuncs
  test_autograd
  test_unary_ufuncs
  test_cuda
::warning:: Gathered no stats from artifacts for build env pytorch-linux-focal-rocm6.0-py3.9 build env and None test config. Using default build env and default test config instead.
Name: high_relevance
  Parallel tests:
    test_ops 1/6
    test_ops 2/6
    test_ops 3/6
    test_ops 4/6
    test_ops 5/6
    test_ops 6/6
  Serial tests:
    test_nn 1/1
    test_torch 1/1
Name: probable_relevance
  Parallel tests:
    test_binary_ufuncs 1/1
    test_unary_ufuncs 1/1
  Serial tests:
    test_autograd 1/1
    test_cuda 1/1
Name: unranked_relevance
  Parallel tests:
  Serial tests:
Starting test batch 'high_relevance' 7.152557373046875e-07 seconds after initiating testing
With sharding, this batch will run 8 tests
Ignoring disabled issues:  ['']
Running test_ops 1/6 ... [2024-03-09 07:03:08.588933]
Executing ['/opt/conda/envs/py_3.9/bin/python3', '-bb', 'test_ops.py', '--shard-id=0', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2024-03-09 07:03:08.589313]

ROCm Version

ROCm 6.0.0