Closed tony9664 closed 1 day ago
However, the code will still try to search for MSA and templates.
This is weird, the input is a correct MSA-free and templates-free input. Doesn't it say in the logs that it is skipping MSA/template search?
what should I do with the pairedMSA?
Set it it empty.
My third question is, for a hetero-oligomer prediction, if I want to manually set the MSA, should I put the same MSA under each protein entity?
If the MSA is the same for all 4 chains, you can use the multi chain ID trick: "id": ["A", "B", "C", "D"]
, and then set the MSA just once for this multi-chain entity.
If the MSA is different for each chain in the oligomer, then you will have to set it separately.
This is weird, the input is a correct MSA-free and templates-free input. Doesn't it say in the logs that it is skipping MSA/template search?
I tried again and the code will still search for MSA. Although when I set a custom non-empty MSA it will say skipping searching.
This seems to work for me if I set run_data_pipeline
to False
.
I checked the code and it is a bug, I will send a fix soon (likely tomorrow). Thank you very much for reporting!
There are two possible workarounds for the time being:
--run_data_pipeline=false
as suggested by @nzrandol. However, this could be undesirable if you for instance have a dimer and want to run the data pipeline for one chain, but not for the other. In that case you should use option 2.Providing MSAs with just the query sequence and empty templates, e.g. for a query GMRESYAN
, you would set:
"id": ["A"],
"sequence": "GMRESYAN",
"unpairedMsa": ">query\nGMRESYAN",
"pairedMsa": ">query\nGMRESYAN",
"templates": []
I'd like to run it with MSA but without templates. How do I know if it's still using template when I set templates = []?
The log says Filtering protein templates took 0.00 seconds for sequence
. Does this indicate there is no template used? or there is a better way to double check?
Thanks a lot!
I checked the code and it is a bug, I will send a fix soon (likely tomorrow). Thank you very much for reporting!
There are two possible workarounds for the time being:
1. Skipping the data pipeline completely by setting the `--run_data_pipeline=false` as suggested by @nzrandol. However, this could be undesirable if you for instance have a dimer and want to run the data pipeline for one chain, but not for the other. In that case you should use option 2. 2. Providing MSAs with just the query sequence and empty templates, e.g. for a query `GMRESYAN`, you would set: ```json "id": ["A"], "sequence": "GMRESYAN", "unpairedMsa": ">query\nGMRESYAN", "pairedMsa": ">query\nGMRESYAN", "templates": [] ```
Thank you!
I'd like to run it with MSA but without templates. How do I know if it's still using template when I set templates = []?
The log says
Filtering protein templates took 0.00 seconds for sequence
. Does this indicate there is no template used? or there is a better way to double check?Thanks a lot!
I think it's still using template even though I have "templates": []. I got log with 'Filtering protein templates for sequence' and 'Filtering protein templates took 0.01 seconds for sequence', and the predicted structure is exactly the same as predicted without template.
Fixed in https://github.com/google-deepmind/alphafold3/commit/f2579c94952ea38e7e5b47156e105fe9e3ed99bb.
I am also planning to add a separate option to skip just templates but search for MSA, tracking that in https://github.com/google-deepmind/alphafold3/issues/88.
I wanted to run AF3 without MSA. From the documentation https://github.com/google-deepmind/alphafold3/blob/main/docs/input.md I learned that you can set unpaired MSA to an empty string. I used the input file below:
However, the code will still try to search for MSA and templates.
Also I get confused about the documentation. At one place it says :
later it says:
what should I do with the pairedMSA?
My third question is, for a hetero-oligomer prediction, if I want to manually set the MSA, should I put the same MSA under each protein entity?