Open calmip opened 2 years ago
Just wondering if you ever found a solution to this. I am working on a similar application, and I would also like a better (and clearly documented) separation between the ETL and inference step.
I think, the flag --use_precomputed_msas searches for msas in the output directory. For multimers it should be something like: (output_directory_parent/fasta_file_name/msas/A/mgnify_hits.sto etc...). You may have to do a similar thing, which is to copy paste your resulting msas in the output directory, kind of mimicking the results that alphafold would generate. Then, using --use_precomputed_msas would make alphafold read the msas in that directory. A more efficient way would be to write some python codes that would output your chain features and in the inference step directly read the chain features. Regarding features.pkl, alphafold would automatically generate that once it reads the msas from the result directory.
I tried to follow this approach, this is, putting my custom msa in the output/msas folder but it is still not working as expected. As suggested on #300 I replaced bfd_uniclust_hits.a3m with my .a3m custom msa and added empty .sto files. The program is crashing since I believe it cannot handle empty .sto files. If I just remove them, then it will ignore the --use_precomputed_msas file and do the default jackhmmer and hhblits search.
I am currently facing the same problem as Fede112. Has anyone found a solution? Or is there another way to run the locally installed AlphaFold with custom msas? Thanks in advance for your help!
Hello
Like others, here in Calmip computation center, we have trouble trying to run alphafold on our HPC cluster, because of the MSAs step: data files are on an nfs-based filesystem and performances are low, the host memory is satured with nfs caching, sometimes the wole machine is stuck because of alphafold running... No solution as the ones explained here https://github.com/soedinglab/hh-suite/wiki#running-hhblits-efficiently-on-a-computer-cluster can be applied.
So, the idea would be to run the MSAs stage on some specialized hardware, then copying the output directory to the cluster for the GPU-based step. It seems to be possible doing only the GPU-based stage, thanks to the
--use_precomputed_msas
flag. But how to isolate the MSAs stage ? Is it sufficient running "by hand" thejackhmmer
andhhblits
codes, and how to generate thefeatures.pkl
file needed for the second stage ?Thanks ! Emmanuel C.