aqlaboratory / openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2
Apache License 2.0
2.84k stars 550 forks source link

No error but also no output ("Successfully loaded params ...") #266

Open ecfischer opened 1 year ago

ecfischer commented 1 year ago

I've been trying to set up Openfold to test it out this week. It's been somewhat of a struggle but I am sooo close right now. Everything is installed and no errors along the way.

But now this, .... run_pretrained_openfold.py simply returns a message.

INFO:/home/ubuntu/openfold/openfold/utils/script_utils.py:Loaded OpenFold parameters at openfold/resources/openfold_params/finetuning_ptm_2.pt... Then it returns the shell to me. Any idea what is happening here?

I am trying to run inference on a small test protein from RODA.

My command:

python3 run_pretrained_openfold.py \
    /home/ubuntu/openfold/input \
    data/pdb_mmcif/mmcif_files/ \
    --use_precomputed_alignments msa/1mh1 \
    --output_dir output \
    --model_device "cuda:0" \
    --jackhmmer_binary_path lib/conda/envs/openfold_venv/bin/jackhmmer \
    --hhblits_binary_path lib/conda/envs/openfold_venv/bin/hhblits \
    --hhsearch_binary_path lib/conda/envs/openfold_venv/bin/hhsearch \
    --kalign_binary_path lib/conda/envs/openfold_venv/bin/kalign \
    --config_preset "model_1_ptm" \
    --openfold_checkpoint_path openfold/resources/openfold_params/finetuning_ptm_2.pt

Everything was set up on Ubuntu 20.04 with CUDA 11.6 (11.3 failed installation) nvidia tool kit 11.6 (/usr/local/cuda-11.6/bin in path) install_third_party_dependencies.sh worked with no errors

conda env # platform: linux-64 _libgcc_mutex=0.1=conda_forge _openmp_mutex=4.5=2_kmp_llvm absl-py=1.4.0=pypi_0 aiohttp=3.8.3=pypi_0 aiosignal=1.3.1=pypi_0 async-timeout=4.0.2=pypi_0 asynctest=0.13.0=pypi_0 attrs=22.2.0=pypi_0 biopython=1.79=pypi_0 blas=1.0=mkl ca-certificates=2022.12.7=ha878542_0 cachetools=5.3.0=pypi_0 certifi=2022.12.7=pypi_0 charset-normalizer=2.0.12=pypi_0 click=8.1.3=pypi_0 contextlib2=21.6.0=pypi_0 cudatoolkit=11.3.1=h9edb442_11 deepspeed=0.5.10=pypi_0 dllogger=1.0.0=pypi_0 dm-tree=0.1.6=pypi_0 docker-pycreds=0.4.0=pypi_0 einops=0.6.0=pypi_0 fftw=3.3.10=nompi_hf0379b8_106 flash-attn=0.1=pypi_0 frozenlist=1.3.3=pypi_0 fsspec=2023.1.0=pypi_0 future=0.18.3=pypi_0 gitdb=4.0.10=pypi_0 gitpython=3.1.30=pypi_0 google-auth=2.16.0=pypi_0 google-auth-oauthlib=0.4.6=pypi_0 grpcio=1.51.1=pypi_0 hhsuite=3.3.0=py37pl5321h675a0cb_5 hjson=3.1.0=pypi_0 hmmer=3.3.2=h87f3376_2 icu=70.1=h27087fc_0 idna=3.4=pypi_0 importlib-metadata=6.0.0=pypi_0 kalign2=2.04=hec16e2b_3 ld_impl_linux-64=2.40=h41732ed_0 libffi=3.4.2=h7f98852_5 libgcc-ng=12.2.0=h65d4601_19 libgfortran-ng=12.2.0=h69a702a_19 libgfortran5=12.2.0=h337968e_19 libhwloc=2.8.0=h32351e8_1 libiconv=1.17=h166bdaf_0 libnsl=2.0.0=h7f98852_0 libsqlite=3.40.0=h753d276_0 libstdcxx-ng=12.2.0=h46fd767_19 libxml2=2.10.3=h7463322_0 libzlib=1.2.13=h166bdaf_4 llvm-openmp=15.0.7=h0cdce71_0 markdown=3.4.1=pypi_0 markupsafe=2.1.2=pypi_0 mkl=2021.4.0=h8d4b97c_729 mkl-service=2.4.0=py37h402132d_0 mkl_fft=1.3.1=py37h3e078e5_1 mkl_random=1.2.2=py37h219a48f_0 ml-collections=0.1.0=pypi_0 multidict=6.0.4=pypi_0 ncurses=6.3=h27087fc_1 ninja=1.11.1=pypi_0 numpy=1.21.2=pypi_0 oauthlib=3.2.2=pypi_0 ocl-icd=2.3.1=h7f98852_0 ocl-icd-system=1.0.0=1 openfold=1.0.0=pypi_0 openmm=7.5.1=py37h96c4ddf_1 openssl=3.0.7=h0b41bf4_2 packaging=23.0=pypi_0 pathtools=0.1.2=pypi_0 pdbfixer=1.7=pyhd3deb0d_0 perl=5.32.1=2_h7f98852_perl5 pip=22.3.1=pyhd8ed1ab_0 promise=2.3=pypi_0 protobuf=3.20.3=pypi_0 psutil=5.9.4=pypi_0 py-cpuinfo=9.0.0=pypi_0 pyasn1=0.4.8=pypi_0 pyasn1-modules=0.2.8=pypi_0 pydeprecate=0.3.1=pypi_0 python=3.7.12=hf930737_100_cpython python_abi=3.7=3_cp37m pytorch=1.12.1=py3.7_cuda11.3_cudnn8.3.2_0 pytorch-lightning=1.5.10=pypi_0 pytorch-mutex=1.0=cuda pyyaml=5.4.1=pypi_0 readline=8.1.2=h0f457ee_0 requests=2.26.0=pypi_0 requests-oauthlib=1.3.1=pypi_0 rsa=4.9=pypi_0 scipy=1.7.1=pypi_0 sentry-sdk=1.14.0=pypi_0 setproctitle=1.3.2=pypi_0 setuptools=59.5.0=py37h89c1867_0 shortuuid=1.0.11=pypi_0 six=1.16.0=pyh6c4a22f_0 smmap=5.0.0=pypi_0 sqlite=3.40.0=h4ff8645_0 tbb=2021.7.0=h924138e_1 tensorboard=2.11.2=pypi_0 tensorboard-data-server=0.6.1=pypi_0 tensorboard-plugin-wit=1.8.1=pypi_0 tk=8.6.12=h27826a3_0 torchmetrics=0.11.0=pypi_0 tqdm=4.62.2=pypi_0 triton=1.0.0=pypi_0 typing-extensions=3.10.0.2=pypi_0 urllib3=1.26.14=pypi_0 wandb=0.12.21=pypi_0 werkzeug=2.2.2=pypi_0 wheel=0.38.4=pyhd8ed1ab_0 xz=5.2.6=h166bdaf_0 yarl=1.8.2=pypi_0 zipp=3.11.0=pypi_0

>>> torch.__version__
'1.12.1'
>>> torch.cuda.is_available()
True

Thanks in advance !

gahdritz commented 1 year ago

Could you send the structure of your input dir and also the contents of the 1mh1 FASTA?

ecfischer commented 1 year ago

input fasta file (the original fasta from rcsb.org gave issues so I renamed it)

>1mh1
GSPQAIKCVVVGDGAVGKTCLLISYTTNAFPGEYIPTVFDNYSANVMVDGKPVNLGLWDTAGQEDYDRLRPLSYPQTDVSLICFSLVSPASFENVRAKWYPEVRHHCPNTPIILVGTKLDLRDDKDTIEKLKEKKLTPITYPQGLAMAKEIGAVKYLECSALTQRGLKTVFDEAIRAVLCPPPVKK

openfold/input file tree

input
└── 1mh1
    └── 1mh1.fasta

openfold/msa file tree

msa
└── 1mh1
    ├── bfd_uniclust_hits.a3m
    ├── mgnify_hits.a3m
    ├── pdb70_hits.hhr
    └── uniref90_hits.a3m

I am running this on g4dn.4xlarge with a T4 gpu (16gb) to begin with before I spend money on a bigger GPU. Could it be an out of RAM error that it is not giving explicitly?

Thanks

gahdritz commented 1 year ago

Your input dir should contain all .fasta files directly (i.e. you should collapse input/1mh1 to input/). 16GB is plenty for a ~200 residue protein.

ecfischer commented 1 year ago

Problem solved. Thanks a lot :)

I also had to collapse the msa folder.