Closed Phreely closed 1 year ago
Could you post your PDB file so we can have a look (you can also email me if it is confidential)? PyMol doesn't write valid mmCIF files apparently; the error message essentially means that there is no sequence in your file which is a mandatory item.
MAXIT (https://sw-tools.rcsb.org/apps/MAXIT/index.html) can be used to convert PDB to CIF supporting AlphaFill.
MODEL 1
ATOM 1 N MET A 1 5.035 -11.508 21.797 1.00 68.75 N
ATOM 2 CA MET A 1 3.619 -11.172 21.688 1.00 68.75 C
ATOM 3 C MET A 1 2.748 -12.305 22.234 1.00 68.75 C
ATOM 4 CB MET A 1 3.250 -10.875 20.234 1.00 68.75 C
...
ATOM 2421 CD2 HIS A 317 77.875 36.875 -55.031 1.00 40.16 C
ATOM 2422 ND1 HIS A 317 79.312 36.125 -56.500 1.00 40.16 N
ATOM 2423 CE1 HIS A 317 78.188 36.250 -57.125 1.00 40.16 C
ATOM 2424 NE2 HIS A 317 77.312 36.688 -56.250 1.00 40.16 N
TER 2425 HIS A 317
ENDMDL
END
Here's one file. Indeed I found a sequence part in the "official" precomputed ebi alphafold files which is missing here in my file.
You are also missing a HEADER and a CRYST1 record. Adding those should be enough to run AlphaFill with your PDB file. Also please send a message to the ColabFold developers to ask them to write valid PDB files or, even better, Model-Cif/mmCIF.
MAXIT (https://sw-tools.rcsb.org/apps/MAXIT/index.html) can be used to convert PDB to CIF supporting AlphaFill.
Thanks for the link - I'll have to see if I can get this up and running. I'm on a windows machine and the commands look to me more like Linux specific. Maybe I can check if I can run it in WSL.
You are also missing a HEADER and a CRYST1 record. Adding those should be enough to run AlphaFill with your PDB file. Also please send a message to the ColabFold developers to ask them to write valid PDB files or, even better, Model-Cif/mmCIF.
Thanks for the info. I've opened an issue in the Colabfold git.
In the meantime I'll read a bit about which part makes a header and which part would be a valid CRYST1 record. So far, just adding the sequence like below didn't work.
#
_entity_poly.entity_id 1
_entity_poly.nstd_linkage no
_entity_poly.nstd_monomer no
_entity_poly.pdbx_seq_one_letter_code
;MKRVVVDPISRIEGHLRVEIKVDEASGKVEDALSSGTAWRGIELVAKDRDPRDLWAFVQRICGVCTTTHALASLRAVEDA
LGITIPKNANYIRNIMHSSLDVHDHIVHFYHLHALDWVSPVAALSADPAKTAQLQNDVLATYNVSGLAPAETASKDSAYP
KEFPKATTAYFTAVQQKVKKIVESGQLGIFSAQWWDHPDYNLLPPEVHLMAVSHYLNILDRQRDIVIPHVVFGGKNPHPH
YIVGGMPCSISMNDMNAPINTQRLAAVEQSIALTKDLVDKFYVPDLLAIGKIYVEKGMIDGGGLAKKRVMSYGDYPDDTY
TGISNGDYHKKCIVRSNGVVENFALGVDKATFIPLEGKDFMDPQYLSEEVDHSWFTYPDGTKTLHPIEGVTDPKFTGPKS
GTKEKWEFLDEDKKYSWIKSPTFKGKTAEVGPLAKYIVVYTKVKQGIIKDPTWAESMIVRQIDTVSQVLGVPAHVWMTTM
VGRTACRGLDAQVAANISQYFFNKLVSNIKNGDTTVADMTKFEPNTWDKDAKGVGLVDAPRGGLGHWIHIKDGRSANYQC
IVPSTWNACPKTAANEHGAYEDSMIDTHVKIADKPLEILKVIHSFDPCLACATHLYNKKGEKIVSVNTDALCK
;
_entity_poly.pdbx_strand_id A
_entity_poly.type polypeptide(L)
#
Hacking an mmCIF file never works because of all the internal relations that are in the file. Fixing the PDB file would probably do the trick.
MAXIT (https://sw-tools.rcsb.org/apps/MAXIT/index.html) can be used to convert PDB to CIF supporting AlphaFill.
Thanks for the link - I'll have to see if I can get this up and running. I'm on a windows machine and the commands look to me more like Linux specific. Maybe I can check if I can run it in WSL.
Hello @Phreely! There is an available Docker image containing the Maxit tool. I hope this helps you to convert your pdb to mmcif format:
docker run -i tzok/maxit < input.pdb > output.cif
More details: https://registry.hub.docker.com/r/tzok/maxit
You can convert colabfold models to cif using this online tool (which uses MAXIT to perform the conversion):
https://mmcif.pdbj.org/converter/index.php?l=en
You can then upload the converted file in the alphafill.eu server to find transplant candidates.
However, the alphafill server doesn't seem to find all good transplant candidates. Perhaps because it is searching using sequence identity rather than structure similarity (foldseek may be a better option)? But even if we consider sequence identity, it doesn't seem to detect transplants from very similar sequences.
For example, I submitted a colabfold model and alphafill correctly found a structure with more than 60% sequence identity that provides GDP and MN transplants (1s4o.A). However, the same protein also has an alternative structure (1s4p.A) wherein GDP, MN and MMA (alpha-mannose) are bound. Alphafill does not suggest this structure as a transplant candidate, so I cannot transplant MMA. I checked the PDB-REDO database and saw that 1s4p is not in it, while 1s4o is. So this should explain why it was not detected. But why isn't 1s4p present in the PDB-REDO database? It didn't achieve filtering criteria, perhaps?
If I install alphafill locally, can I "force" it to transplant MMA from the 1s4p structure?
1s4p is in pdb-redo: https://pdb-redo.eu/db/1s4p The issue is that MMA is not on the list of potential transplants. You can add the compound in a local install.
Great! It's very nice to have the option to add compounds. But can I also force transplants from structures I know are similar but don't display 25% identity?
I am trying to install alphafill locally, but I'm currently stuck. I have posted the matter in a closed issue, as the initial error was identical: https://github.com/PDB-REDO/alphafill/issues/35#issuecomment-1733730810
Have a look at the command line options, the minimum identity is a setting, but caveat emptor.
I look forward to playing around with the configuration options when I get alphafill running. On that matter, what software is used to perform the structure superposition? The reference cited in the paper is quite theoretical and discusses quaternion-based solutions to superposition problem, but doesn't provide a pratical method of performing the superposition. Does alphafill perform the superposition step on its own?, Can this be tweaked?
AlphaFill does it by itself. You cannot tweak it.
I look forward to playing around with the configuration options when I get alphafill running. On that matter, what software is used to perform the structure superposition? The reference cited in the paper is quite theoretical and discusses quaternion-based solutions to superposition problem, but doesn't provide a pratical method of performing the superposition. Does alphafill perform the superposition step on its own?, Can this be tweaked?
The code to do the alignment is here:
So you need to edit libcifpp if you want to change that algorithm. Or you can edit the source code for alphafill itself if you want to modify the input of this algorithm.
My PDB files created using Colabfold (https://github.com/sokrypton/ColabFold) are not accepted by Alphafill. If I convert the file to CIF, using PyMol, I get the following message: "An error occurred processing your entry. The error message is: Structure file does not seem to contain polymers, perhaps pdbx_poly_seq_scheme is missing?" In PyMol the structure, however, just looks fine. What do I have to change to make Alphafil accept my PDBs?