YoshitakaMo / localcolabfold

ColabFold on your local PC
MIT License
610 stars 135 forks source link

Functionality to use both custom MSA and fasta with multiple sequences #243

Closed danielguion closed 4 months ago

danielguion commented 4 months ago

I have a fasta file with multiple sequences and a precomputed MSA that I want to use for each sequence, but it seems to not recognize the precomputed MSA and only selects the first sequence from my fasta for structure prediction. Maybe this is an error or maybe I am doing something wrong. Thought I'd report it in case its the former:

ls
alphafold.3879608.log  colabfold_job.e3879608  previously_computed_multiple_sequence_alignment.a3m  ssm_library.fasta  test_alphafold_predict_and_dock.sh
(/localcolabfold/colabfold-conda) [xxxx]$ colabfold_batch --stop-at-score 95 --zip --max-msa 512:1023 /localcolabfold /localcolabfold/colabfold_data/

2024-07-03 15:54:29,355 More than one sequence in /localcolabfold/ssm_library.fasta, ignoring all but the first sequence

2024-07-03 15:54:31,704 Query 1/2: previously_computed_multiple_sequence_alignment (length 195)
2024-07-03 15:55:30,370 alphafold2_ptm_model_1_seed_000 recycle=0 pLDDT=94.4 pTM=0.893
2024-07-03 15:55:48,088 alphafold2_ptm_model_1_seed_000 recycle=1 pLDDT=95.1 pTM=0.903 tol=1.07
2024-07-03 15:55:48,089 alphafold2_ptm_model_1_seed_000 took 63.0s (1 recycles)
2024-07-03 15:55:48,204 reranking models by 'plddt' metric
2024-07-03 15:55:48,205 rank_001_alphafold2_ptm_model_1_seed_000 pLDDT=95.1 pTM=0.903
2024-07-03 15:55:49,494 Query 2/2: ssm_library (length 195)
COMPLETE: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [elapsed: 00:03 remaining: 00:00]
2024-07-03 15:56:13,762 alphafold2_ptm_model_1_seed_000 recycle=0 pLDDT=93.9 pTM=0.892
...
2024-07-03 15:56:31,568 rank_001_alphafold2_ptm_model_1_seed_000 pLDDT=95.6 pTM=0.906
2024-07-03 15:56:32,374 Done 
danielguion commented 4 months ago

I see that I can simply replace the header sequence on the precomputed MSA with each of my sequences to generate multiple a3m files (one for each sequence) and this solves my issue.