dptech-corp / Uni-Fold

An open-source platform for developing protein models beyond AlphaFold.
https://doi.org/10.1101/2022.08.04.502811
Apache License 2.0
380 stars 74 forks source link

trying installation on a computer with an NVIDIA RTX A4000 16GB VRAM #56

Closed avilella closed 1 year ago

avilella commented 2 years ago

Hi all,

I am trying an installation of this repo on a computer with an NVIDIA RTX A4000 16GB VRAM.

The base OS is an Ubuntu 22.04, and I've successfully installed and ran the non-docker version of Alphafold2 following the instructions in this repo: https://github.com/kalininalab/alphafold_non_docker

I've attempted the installation of Uni-Fold, but got stuck at a point where the python scripts attempt to load the unicore libraries. I went to the Uni-Core repo and attempted an non-docker installation, but it complained about the differing versions of torch. See below:


~/Uni-Core$ pip install .                                                                                                                                                                                                       
Defaulting to user installation because normal site-packages is not writeable                                                                                                                                                                
Processing /home/user/Uni-Core                                                                                                                                                                                                           
  Preparing metadata (setup.py) ... error                                                                                                                                                                                                    
  error: subprocess-exited-with-error                                                                                 

  ᅢテラ python setup.py egg_info did not run successfully.                                                                
  ᅢᄁヤツ exit code: 1                                                                                                                                                                                                                             
  ᅢᄁユᅡᄚᅢᄁヤタ> [22 lines of output]                                                                                                                                                                                                                   
      Traceback (most recent call last):                                                                                                                                                                                                     
        File "<string>", line 2, in <module>                                                                                                                                                                                                 
        File "<pip-setuptools-caller>", line 34, in <module>                                                                                                                                                                                 
        File "/home/user/Uni-Core/setup.py", line 105, in <module>                                                                                                                                                                       
          check_cuda_torch_binary_vs_bare_metal(torch.utils.cpp_extension.CUDA_HOME)                                                                                                                                                         
        File "/home/user/Uni-Core/setup.py", line 87, in check_cuda_torch_binary_vs_bare_metal                                                                                                                                           
          torch_binary_major = torch.version.cuda.split(".")[0]                                                                                                                                                                              
      AttributeError: 'NoneType' object has no attribute 'split'                                                      
      No CUDA runtime is found, using CUDA_HOME='/usr'                                                                

      Warning: Torch did not find available GPUs on this system.                                                      
       If your intention is to cross-compile, this is not an error.                                                   
      By default, it will cross-compile for Volta (compute capability 7.0), Turing (compute capability 7.5),          
      and, if the CUDA version is >= 11.0, Ampere (compute capability 8.0).                                                                                                                                                                  
      If you wish to cross-compile for a single specific architecture,                                                
      export TORCH_CUDA_ARCH_LIST="compute capability" before running setup.py.                                       

      torch.__version__  = 1.8.0a0                                                                                    

      [end of output]                                                                                                 

  note: This error originates from a subprocess, and is likely not a problem with pip.                                
error: metadata-generation-failed                                                                                     

ᅢテラ Encountered error while generating package metadata.                                                                                                                                                                                       
ᅢᄁユᅡᄚᅢᄁヤタ> See above for output.

The instructions mention the docker route:

Then, you can create and attach into the docker container, and clone & install unifold.

Would it be possible to add more explicit docker instructions on how to achieve the docker installation for both Uni-Fold and Uni-Core?

Given that the GPU I intend to use is an Ampere platform GPU, should I be concerned that the version of torch in the docker containers may not be compatible with my GPU? Thanks in advance.

guolinke commented 2 years ago

maybe you can try our pre-compiled wheel (https://github.com/dptech-corp/Uni-Core/releases/tag/0.0.1), we also use the wheel in the colab server.

For the docker version, you can try:

docker pull dptechnology/unifold:latest-pytorch1.11.0-cuda11.3
docker run -d -it --gpus all  --net=host  --name unifold dptechnology/unifold:latest-pytorch1.11.0-cuda11.3
docker attach unifold

to use GPU in docker, you need to install nvidia-docker-2 https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

avilella commented 2 years ago

The wheel (whl?) could be an attractive option. Would you be willing to add a 'wheel based' section in the Installation Instructions? I have little experience with whl packaging, but I'll be the first beta tester of the procedure.

On Thu, Sep 29, 2022 at 5:32 AM Guolin Ke @.***> wrote:

maybe you can try our pre-compiled wheel ( https://github.com/dptech-corp/Uni-Core/releases/tag/0.0.1), we also use the wheel in the colab server.

For the docker version, you can try:

docker pull dptechnology/unifold:latest-pytorch1.11.0-cuda11.3 docker run -d -it --gpus all --net=host --name unifold dptechnology/unifold:latest-pytorch1.11.0-cuda11.3 docker attach unifold

to use GPU in docker, you need to install nvidia-docker-2 https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

— Reply to this email directly, view it on GitHub https://github.com/dptech-corp/Uni-Fold/issues/56#issuecomment-1261741075, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABGSN3YUOQVVQRBXRQM5HLWAULXHANCNFSM6AAAAAAQWTQQLY . You are receiving this because you authored the thread.Message ID: @.***>

guolinke commented 2 years ago

it is quite simple, first download the wheel according to your python, pytorch, cuda version, and then run

pip3 -q install "unicore-0.0.1+cu113torch1.12.1-cp37-cp37m-linux_x86_64.whl"

the "unicore-0.0.1+cu113torch1.12.1-cp37-cp37m-linux_x86_64.whl" could be replaced to your downloaded one.

avilella commented 1 year ago

I managed to install the whl and successfully complete the first part of the multimer prediction, but then I get this error in the second part:

Starting prediction...        
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.0.6) or chardet (4.0.0) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
start to load params /home/petmedix/Uni-Fold/multimer.unifold.pt
Traceback (most recent call last):
  File "/home/petmedix/Uni-Fold/unifold/inference.py", line 266, in <module>
    main(args)       
  File "/home/petmedix/Uni-Fold/unifold/inference.py", line 91, in main
    model.load_state_dict(state_dict)                                                          
  File "/home/petmedix/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(      
RuntimeError: Error(s) in loading state_dict for AlphaFold:
        Missing key(s) in state_dict: "template_pair_embedder.linear.weight", "template_pair_embedder.linear.bias", "template_pointwise_att.mha.linear_q.weight", "template_pointwise_att.mha.
linear_k.weight", "template_pointwise_att.mha.linear_v.weight", "template_pointwise_att.mha.linear_o.weight", "template_pointwise_att.mha.linear_o.bias", "structure_module.ipa.linear_q.bias"
, "structure_module.ipa.linear_kv.weight", "structure_module.ipa.linear_kv.bias", "structure_module.ipa.linear_kv_points.weight", "structure_module.ipa.linear_kv_points.bias". 
        Unexpected key(s) in state_dict: "template_proj.output_linear.weight", "template_proj.output_linear.bias", "template_pair_embedder.z_layer_norm.weight", "template_pair_embedder.z_lay
er_norm.bias", "template_pair_embedder.z_linear.weight", "template_pair_embedder.z_linear.bias", "template_pair_embedder.linear.0.weight", "template_pair_embedder.linear.0.bias", "template_p
air_embedder.linear.1.weight", "template_pair_embedder.linear.1.bias", "template_pair_embedder.linear.2.weight", "template_pair_embedder.linear.2.bias", "template_pair_embedder.linear.3.weig
ht", "template_pair_embedder.linear.3.bias", "template_pair_embedder.linear.4.weight", "template_pair_embedder.linear.4.bias", "template_pair_embedder.linear.5.weight", "template_pair_embedd
er.linear.5.bias", "template_pair_embedder.linear.6.weight", "template_pair_embedder.linear.6.bias", "template_pair_embedder.linear.7.weight", "template_pair_embedder.linear.7.bias", "struct
ure_module.ipa.linear_k.weight", "structure_module.ipa.linear_v.weight", "structure_module.ipa.linear_k_points.weight", "structure_module.ipa.linear_k_points.bias", "structure_module.ipa.lin
ear_v_points.weight", "structure_module.ipa.linear_v_points.bias", "aux_heads.pae.linear.weight", "aux_heads.pae.linear.bias". 
        size mismatch for input_embedder.linear_tf_z_i.weight: copying a param with shape torch.Size([128, 21]) from checkpoint, the shape in current model is torch.Size([128, 22]).
        size mismatch for input_embedder.linear_tf_z_j.weight: copying a param with shape torch.Size([128, 21]) from checkpoint, the shape in current model is torch.Size([128, 22]).
        size mismatch for input_embedder.linear_tf_m.weight: copying a param with shape torch.Size([256, 21]) from checkpoint, the shape in current model is torch.Size([256, 22]).
        size mismatch for input_embedder.linear_relpos.weight: copying a param with shape torch.Size([128, 73]) from checkpoint, the shape in current model is torch.Size([128, 65]).
        size mismatch for template_angle_embedder.linear_1.weight: copying a param with shape torch.Size([256, 34]) from checkpoint, the shape in current model is torch.Size([256, 57]).
        size mismatch for aux_heads.masked_msa.linear.weight: copying a param with shape torch.Size([22, 256]) from checkpoint, the shape in current model is torch.Size([23, 256]).
        size mismatch for aux_heads.masked_msa.linear.bias: copying a param with shape torch.Size([22]) from checkpoint, the shape in current model is torch.Size([23]).

Any ideas? Does this mean I need to somehow convert the alphafold models, or am I already using the correct model in model_2_ft?

avilella commented 1 year ago

Answering my own question: it works if I use model_name multimer_ft rather than model_2_ft.