YoshitakaMo / localcolabfold

ColabFold on your local PC
MIT License
608 stars 134 forks source link

Question: Custom msa #255

Closed jimfeng9705 closed 2 months ago

jimfeng9705 commented 2 months ago

"ColabFold can accept various input files now. See the help messsage. You can set your own A3M file, a fasta file that contains multiple sequences (in FASTA format), or a directory that contains multiple fasta files."

What is the argument/flag to define the directory that contains multiple asm files? Are these files to be used sequentially to generate corresponding pdb files?

YoshitakaMo commented 2 months ago

You can use your own MSA file (.a3m format) as input:

colabfold_batch --templates --amber YOUR.a3m outputdir/

If you have multiple MSA files, you may specify the directory containing them:

colabfold_batch --templates --amber DIRECTORY_containing_msa_files outputdir/

If a directory is specified as input, colabfold_batch will predict the structures based on each MSA file sequentially.

jimfeng9705 commented 2 months ago

Where can we define the csv file? How to enter the directory for the custom msas? Do we need a flag for the msa files?

I used the following command, but encountered "unrecognized argument:outputdir"

colabfold_batch --templates --amber xxx.csv /home/subsampled_MSAs outputdir/

YoshitakaMo commented 2 months ago
colabfold_batch --templates --amber xxx.csv /home/subsampled_MSAs outputdir/

You specified three positional arguments. xxx.csv and /home/subsampled_MSAs cannot be combined. MSA input in .a3m format may only be used for a single prediction.

If you want to predict for multiple MSA files in a directory, you may achieve it using bash's for loop:

#!/bin/bash
MSA_DIR="/home/subsampled_MSAs"
for INPUT in $MSA_DIR/*.a3m; do
  colabfold_batch --templates --amber ${INPUT} outputdir/
done