Open smg3d opened 4 hours ago
Here is my suggesttion:
Prepare Your MSA: Format your MSA in A3M, which is similar to FASTA but can include lowercase letters for insertions.
Embed MSA Content in JSON: Place your MSA content in the "unpairedMsa"
field of the input JSON file. Ensure newline characters are correctly handled with \n
.
Example:
{
"protein": {
"id": "A",
"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF",
"unpairedMsa": ">seq1\\nMVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF\\n>seq2\\nMVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTFFPHF",
"pairedMsa": "",
"templates": []
}
}
Considerations:
Handling Newlines: In JSON strings, newlines should be represented by \\n
(in the actual JSON file, it’s \n
, but needs escaping in strings).
Direct Embedding: The "unpairedMsa"
field should contain the actual MSA content string, not a filename or path.
Validate JSON Format: Make sure your JSON file is correctly formatted. You might want to use an online JSON validator for checking.
Thanks @Hanziwww .
Does that input.json work for you? For me, it does not recognize the first sequence of the MSA (looks like it reads an empty sequence):
raise ValueError(
ValueError: First MSA sequence is not the query_sequence='MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF'
Hi @smg3d,
You're absolutely right—I made a mistake in my previous response. The newline character in JSON strings should be represented as \n
, not \\n
. Using \\n
will not correctly parse the newlines within the JSON string, leading to errors like the one you encountered.
Here's the corrected JSON input:
{
"name": "My AlphaFold Job",
"modelSeeds": [1],
"sequences": [
{
"protein": {
"id": "A",
"sequence": "MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF",
"unpairedMsa": ">seq1\nMVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF\n>seq2\nMVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTFFPHF",
"pairedMsa": "",
"templates": []
}
}
],
"dialect": "alphafold3",
"version": 1
}
Here's how you can run AlphaFold using Docker with the corrected JSON:
docker run -it \
--volume /home/mars/disk3/af3input:/root/af_input \
--volume /home/mars/disk3/af3output:/root/af_output \
--volume /home/mars/disk3/af3md:/root/models \
--volume /home/mars/disk3/af3db:/root/public_databases \
--gpus all alphafold3 \
python run_alphafold.py \
--json_path=/root/af_input/fold_input.json \
--model_dir=/root/models \
--output_dir=/root/af_output
output cif: my_alphafold_job_model.zip
Sorry for misleading.
Thanks @Hanziwww .
It works now.
I think it might be a good idea to show such an example in the input doc:
{
"protein": {
"id": "A",
"sequence": "PVLSCGEWQL",
"modifications": [
{"ptmType": "HY3", "ptmPosition": 1},
{"ptmType": "P1L", "ptmPosition": 5}
],
"unpairedMsa": ">seq1\nPVLSCGEWQL\n>seq2\nPILSCADWQ-",
"pairedMsa": ...,
"templates": [...]
}
}
I'm glad to hear that the input is working now.
By the way, I'd like to introduce a user-friendly graphical interface that I developed to solve the JSON generation issue and running AlphaFold 3 predictions. Feel free to check out GUI repository.
Thanks for providing the AF3 source. it is really appreciated.
I could not find the format to use in order to provide our own MSA in the input json file.
The input documentation mentions "If the unpairedMsa field is set to a custom A3M string, AlphaFold 3 will use the provided MSA instead of building one as part of the data pipeline. This is considered an expert option.". But what is the format of the "custom A3M string"
The doc provides the two following examples, but does not show the string or list format for
unpairedMsa
and
For
"unpairedMsa":
I tried filename and various list formats, but none are working.