Contrib unit test failure in openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data
Minimal Steps/Code to Reproduce the Bug
root@b4db9ba94176:/opt/pytorch/apex/apex/contrib/test# pytest -vvvs -k test_fused_update_on_random_data
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.10.12, pytest-8.1.1, pluggy-1.5.0 -- /usr/bin/python3
cachedir: .pytest_cache
Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket=<bucket_type>
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/opt/pytorch/apex/apex/contrib/test/.hypothesis/examples'))
rootdir: /opt/pytorch/apex
configfile: pyproject.toml
plugins: timestamper-0.0.10, xdist-3.6.1, random-order-1.1.1, benchmark-4.0.0, rerunfailures-14.0, anyio-4.3.0, timeout-2.3.1, xdoctest-1.1.0, hypothesis-6.100.0, shard-0.1.2, cov-4.1.0, flakefinder-1.1.0
collected 113 items / 112 deselected / 1 selected
Running 1 items in this shard: apex/contrib/test/openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data
[2024-05-15 17:12:58] openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data FAILED
=================================================================================================== FAILURES ====================================================================================================
_____________________________________________________________________________ FusedAdamSWATestCase.test_fused_update_on_random_data _____________________________________________________________________________
self = <test_fused_adam_swa.FusedAdamSWATestCase testMethod=test_fused_update_on_random_data>
def setUp(self):
super().setUp()
self._seed = 19260817
random.seed(self._seed)
torch.manual_seed(self._seed)
> torch.backends.cudnn.deterministic = True
openfold_triton/test_fused_adam_swa.py:91:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch.backends.ContextProp object at 0x7f57fdef1a80>, obj = <module 'torch.backends.cudnn' from '/opt/pytorch/pytorch/torch/backends/cudnn/__init__.py'>, val = True
def __set__(self, obj, val):
if not flags_frozen():
self.setter(val)
else:
> raise RuntimeError(
f"not allowed to set {obj.__name__} flags "
"after disable_global_flags; please use flags() context manager instead"
)
E RuntimeError: not allowed to set torch.backends.cudnn flags after disable_global_flags; please use flags() context manager instead
../../../../pytorch/torch/backends/__init__.py:43: RuntimeError
=============================================================================================== warnings summary ================================================================================================
../../transformer/tensor_parallel/cross_entropy.py:78
/opt/pytorch/apex/apex/transformer/tensor_parallel/cross_entropy.py:78: DeprecationWarning: invalid escape sequence '\s'
"""
../../transformer/pipeline_parallel/schedules/fwd_bwd_pipelining_with_interleaving.py:49
/opt/pytorch/apex/apex/transformer/pipeline_parallel/schedules/fwd_bwd_pipelining_with_interleaving.py:49: DeprecationWarning: invalid escape sequence '\_'
"""Run interleaved 1F1B schedule with communication between pipeline stages as needed.
../../transformer/pipeline_parallel/schedules/fwd_bwd_pipelining_without_interleaving.py:261
/opt/pytorch/apex/apex/transformer/pipeline_parallel/schedules/fwd_bwd_pipelining_without_interleaving.py:261: DeprecationWarning: invalid escape sequence '\_'
"""Run non-interleaved 1F1B schedule, with communication between pipeline stages.
../../../../pytorch/torch/_custom_ops.py:253
/opt/pytorch/pytorch/torch/_custom_ops.py:253: DeprecationWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
return torch.library.impl_abstract(qualname, func, _stacklevel=2)
../../../../vision/torchvision/transforms/_functional_pil.py:242
/opt/pytorch/vision/torchvision/transforms/_functional_pil.py:242: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
interpolation: int = Image.BILINEAR,
../../../../vision/torchvision/transforms/_functional_pil.py:288
/opt/pytorch/vision/torchvision/transforms/_functional_pil.py:288: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
interpolation: int = Image.NEAREST,
../../../../vision/torchvision/transforms/_functional_pil.py:304
/opt/pytorch/vision/torchvision/transforms/_functional_pil.py:304: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
interpolation: int = Image.NEAREST,
../../../../vision/torchvision/transforms/_functional_pil.py:321
/opt/pytorch/vision/torchvision/transforms/_functional_pil.py:321: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
interpolation: int = Image.BICUBIC,
../optimizers/distributed_fused_adam.py:273
/opt/pytorch/apex/apex/contrib/optimizers/distributed_fused_adam.py:273: DeprecationWarning: invalid escape sequence '\:'
"""Adam optimizer with ZeRO algorithm.
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================ short test summary info ============================================================================================
FAILED openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data - RuntimeError: not allowed to set torch.backends.cudnn flags after disable_global_flags; please use flags() context manager instead
================================================================================= 1 failed, 112 deselected, 9 warnings in 7.40s =================================================================================
Expected Behavior
test pass
Environment
test failed since 2/13/24 although https://github.com/NVIDIA/apex/pull/1759 was merged on 12/14/23 and there has been no change on the test since then
test was skipped before 2/13/24 because some environment setup in our CI, e.g. 2/12/24:
openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data SKIPPED (Skip testing FusedAdamSWA: No module named 'einops')
Describe the Bug
Contrib unit test failure in
openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data
Minimal Steps/Code to Reproduce the Bug
Expected Behavior test pass
Environment test failed since 2/13/24 although https://github.com/NVIDIA/apex/pull/1759 was merged on 12/14/23 and there has been no change on the test since then
test was skipped before 2/13/24 because some environment setup in our CI, e.g. 2/12/24:
cc @crcrpar @eqy @nWEIdia