tests.test_primitives.TestLMA: torch.cuda.OutOfMemoryError: CUDA out of memory

rafaeltiveron commented 2 months ago

I've just run scripts/install_third_party_dependencies.sh and python3 setup.py install. No erros during the instalation. resouces folder was created inside openfold folder, but:

./scripts/run_unit_tests.sh
[2024-07-09 18:14:45,945] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
s.................EEEEUsing /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/evoformer_attn/build.ninja...
Building extension module evoformer_attn...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module evoformer_attn...
Time to load evoformer_attn op: 0.08634114265441895 seconds
........s...s.sss.ss.E...sssssssss.sss....ssssss..s.s.s.ss.s....E.s.s..ss...ss.sEsE...s........
======================================================================
ERROR: test_compare_evoformer_bf16 (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Run evoformer comparison test with BF16 precision.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/openfold/tests/test_deepspeed_evo_attention.py", line 224, in test_compare_evoformer_bf16
    self.compare_evoformer(dtype=torch.bfloat16, eps=4e-2)
  File "/usr/local/openfold/tests/test_deepspeed_evo_attention.py", line 186, in compare_evoformer
    model = compare_utils.get_global_pretrained_openfold()
  File "/usr/local/openfold/tests/compare_utils.py", line 79, in get_global_pretrained_openfold
    raise FileNotFoundError(
FileNotFoundError: Cannot load pretrained parameters. Make sure to run the 
                installation script before running tests.

======================================================================
ERROR: test_compare_evoformer_fp32 (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Run evoformer comparison test with FP32 precision.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/openfold/tests/test_deepspeed_evo_attention.py", line 228, in test_compare_evoformer_fp32
    self.compare_evoformer(dtype=torch.float32, eps=2e-2)
  File "/usr/local/openfold/tests/test_deepspeed_evo_attention.py", line 187, in compare_evoformer
    out_repro_msa, out_repro_pair = model.evoformer.blocks[0](
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/evoformer.py", line 463, in forward
    self.msa_att_row(
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/msa.py", line 260, in forward
    m, mask_bias, z = self._prep_inputs(
  File "/usr/local/openfold/openfold/model/msa.py", line 157, in _prep_inputs
    z_chunk = self.layer_norm_z(z_chunk)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/primitives.py", line 235, in forward
    out = nn.functional.layer_norm(
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2543, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

======================================================================
ERROR: test_compare_model (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Run full model with and without using DeepSpeed Evoformer attention kernel
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/openfold/tests/test_deepspeed_evo_attention.py", line 322, in test_compare_model
    out_repro = model(batch)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/model.py", line 568, in forward
    outputs, m_1_prev, z_prev, x_prev, early_stop = self.iteration(
  File "/usr/local/openfold/openfold/model/model.py", line 253, in iteration
    m, z = self.input_embedder(
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/embedders.py", line 131, in forward
    tf_emb_i = self.linear_tf_z_i(tf)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/primitives.py", line 206, in forward
    return nn.functional.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

======================================================================
ERROR: test_compare_template_stack (tests.test_deepspeed_evo_attention.TestDeepSpeedKernel)
Compare Template Stack output with and without using DeepSpeed Evoformer attention kernel.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/openfold/tests/test_deepspeed_evo_attention.py", line 255, in test_compare_template_stack
    out_repro = model.embed_templates(
  File "/usr/local/openfold/openfold/model/model.py", line 167, in embed_templates
    template_embeds = self.template_embedder(
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/embedders.py", line 692, in forward
    t = self.template_pair_embedder(t)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/embedders.py", line 590, in forward
    x = self.linear(x)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/primitives.py", line 206, in forward
    return nn.functional.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_CUDA_addmm)

======================================================================
ERROR: test_import_jax_weights_ (tests.test_import_weights.TestImportWeights)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/openfold/tests/test_import_weights.py", line 36, in test_import_jax_weights_
    import_jax_weights_(
  File "/usr/local/openfold/openfold/utils/import_weights.py", line 650, in import_jax_weights_
    data = np.load(npz_path)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/numpy/lib/npyio.py", line 405, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/openfold/tests/../openfold/resources/params/params_model_1_ptm.npz'

======================================================================
ERROR: test_lma_vs_attention (tests.test_primitives.TestLMA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/openfold/tests/test_primitives.py", line 45, in test_lma_vs_attention
    l = a(q, kv, biases=biases, use_lma=True).cpu()
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/primitives.py", line 537, in forward
    o = _lma(q, k, v, biases, lma_q_chunk_size, lma_kv_chunk_size)
  File "/usr/local/openfold/openfold/model/primitives.py", line 733, in _lma
    a = torch.einsum(
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/functional.py", line 377, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.62 GiB. GPU 0 has a total capacty of 7.66 GiB of which 1.08 GiB is free. Including non-PyTorch memory, this process has 5.88 GiB memory in use. Of the allocated memory 4.11 GiB is allocated by PyTorch, and 1.65 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

======================================================================
ERROR: test_tri_mul_in_inference (tests.test_triangular_multiplicative_update.TestTriangularMultiplicativeUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/openfold/tests/test_triangular_multiplicative_update.py", line 148, in test_tri_mul_in_inference
    self._tri_mul_inplace(incoming=True)
  File "/usr/local/openfold/tests/test_triangular_multiplicative_update.py", line 129, in _tri_mul_inplace
    out_stock = module(
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/triangular_multiplicative_update.py", line 426, in forward
    z = self.layer_norm_in(z)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/primitives.py", line 235, in forward
    out = nn.functional.layer_norm(
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2543, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

======================================================================
ERROR: test_tri_mul_out_inference (tests.test_triangular_multiplicative_update.TestTriangularMultiplicativeUpdate)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/openfold/tests/test_triangular_multiplicative_update.py", line 145, in test_tri_mul_out_inference
    self._tri_mul_inplace()
  File "/usr/local/openfold/tests/test_triangular_multiplicative_update.py", line 129, in _tri_mul_inplace
    out_stock = module(
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/triangular_multiplicative_update.py", line 426, in forward
    z = self.layer_norm_in(z)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/primitives.py", line 235, in forward
    out = nn.functional.layer_norm(
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2543, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument weight in method wrapper_CUDA__native_layer_norm)

----------------------------------------------------------------------
Ran 117 tests in 24.257s

FAILED (errors=8, skipped=41)

Test(s) failed. Make sure you've installed all Python dependencies.

rafaeltiveron commented 2 months ago

I've reduced from 8 to 2 errors last, when Line 79 of the file tests/compare_utils.py is modified to:

_param_path = os.path.join(dir_path, "/<directory until>/openfold/resources/params", f"params_{consts.model}.npz")

rafaeltiveron commented 2 months ago

From 2 to 1 error, when Line 405 of the file tests/test_import_weights.py is modified to:

npz_path = Path(__file__).parent.resolve() / f"openfold/resources/params/params_{consts.model}.npz", considering ln -s /<directory until>/openfold /<directory until>/openfold/tests/openfold.

rafaeltiveron commented 1 month ago

The following error persists:

ERROR: test_lma_vs_attention (tests.test_primitives.TestLMA)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/openfold/tests/test_primitives.py", line 45, in test_lma_vs_attention
    l = a(q, kv, biases=biases, use_lma=True).cpu()
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/openfold/openfold/model/primitives.py", line 537, in forward
    o = _lma(q, k, v, biases, lma_q_chunk_size, lma_kv_chunk_size)
  File "/usr/local/openfold/openfold/model/primitives.py", line 733, in _lma
    a = torch.einsum(
  File "/usr/local/miniforge/mambaforge/envs/openfold_env/lib/python3.10/site-packages/torch/functional.py", line 377, in einsum
    return _VF.einsum(equation, operands)  # type: ignore[attr-defined]
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.62 GiB. GPU 0 has a total capacty of 7.66 GiB of which 1.08 GiB is free. Including non-PyTorch memory, this process has 5.88 GiB memory in use. Of the allocated memory 4.11 GiB is allocated by PyTorch, and 1.65 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Some memory variation, but still the same.

I've added in the start of openfold/tests/test_primitives.py and openfold/openfold/model/primitives.py:

import os

# Configuration to avoid GPU memory fragmentation:
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:1024'

It has reduced from 5 Gb to 3 Gb usable memory when prediction is started, but still not sufficient to conclude with success this test task. Still out of memory. I don't know how to use "use_lma": true for tests.test_primitives.TestLMA. Maybe it can reduce more memory consumption. Using NVIDIA GeForce RTX 4060.

aqlaboratory / openfold

tests.test_primitives.TestLMA: torch.cuda.OutOfMemoryError: CUDA out of memory #467