google-deepmind / alphafold

Open source code for AlphaFold 2.
Apache License 2.0
12.94k stars 2.29k forks source link

Use of use_precomputed_msas #427

Open calmip opened 2 years ago

calmip commented 2 years ago

Hello

Like others, here in Calmip computation center, we have trouble trying to run alphafold on our HPC cluster, because of the MSAs step: data files are on an nfs-based filesystem and performances are low, the host memory is satured with nfs caching, sometimes the wole machine is stuck because of alphafold running... No solution as the ones explained here https://github.com/soedinglab/hh-suite/wiki#running-hhblits-efficiently-on-a-computer-cluster can be applied.

So, the idea would be to run the MSAs stage on some specialized hardware, then copying the output directory to the cluster for the GPU-based step. It seems to be possible doing only the GPU-based stage, thanks to the --use_precomputed_msas flag. But how to isolate the MSAs stage ? Is it sufficient running "by hand" the jackhmmer and hhblits codes, and how to generate the features.pkl file needed for the second stage ?

Thanks ! Emmanuel C.

andreadisimone commented 2 years ago

Just wondering if you ever found a solution to this. I am working on a similar application, and I would also like a better (and clearly documented) separation between the ETL and inference step.

abhinavb22 commented 2 years ago

I think, the flag --use_precomputed_msas searches for msas in the output directory. For multimers it should be something like: (output_directory_parent/fasta_file_name/msas/A/mgnify_hits.sto etc...). You may have to do a similar thing, which is to copy paste your resulting msas in the output directory, kind of mimicking the results that alphafold would generate. Then, using --use_precomputed_msas would make alphafold read the msas in that directory. A more efficient way would be to write some python codes that would output your chain features and in the inference step directly read the chain features. Regarding features.pkl, alphafold would automatically generate that once it reads the msas from the result directory.

Fede112 commented 2 years ago

I tried to follow this approach, this is, putting my custom msa in the output/msas folder but it is still not working as expected. As suggested on #300 I replaced bfd_uniclust_hits.a3m with my .a3m custom msa and added empty .sto files. The program is crashing since I believe it cannot handle empty .sto files. If I just remove them, then it will ignore the --use_precomputed_msas file and do the default jackhmmer and hhblits search.

mariagviegas commented 1 year ago

I am currently facing the same problem as Fede112. Has anyone found a solution? Or is there another way to run the locally installed AlphaFold with custom msas? Thanks in advance for your help!