Alphafill doesn't accept PDB files created by Colabfold

PDB-REDO / alphafill

AlphaFill is an algorithm based on sequence and structure similarity that “transplants” missing compounds to the AlphaFold models. By adding the molecular context to the protein structures, the models can be more easily appreciated in terms of function and structure integrity.

https://alphafill.eu

BSD 2-Clause "Simplified" License

90 stars 18 forks source link

Alphafill doesn't accept PDB files created by Colabfold #25

Closed Phreely closed 1 year ago

Phreely commented 1 year ago

My PDB files created using Colabfold (https://github.com/sokrypton/ColabFold) are not accepted by Alphafill. If I convert the file to CIF, using PyMol, I get the following message: "An error occurred processing your entry. The error message is: Structure file does not seem to contain polymers, perhaps pdbx_poly_seq_scheme is missing?" In PyMol the structure, however, just looks fine. What do I have to change to make Alphafil accept my PDBs?

drlemmus commented 1 year ago

Could you post your PDB file so we can have a look (you can also email me if it is confidential)? PyMol doesn't write valid mmCIF files apparently; the error message essentially means that there is no sequence in your file which is a mandatory item.

huiwenke commented 1 year ago

MAXIT (https://sw-tools.rcsb.org/apps/MAXIT/index.html) can be used to convert PDB to CIF supporting AlphaFill.

Phreely commented 1 year ago

MODEL 1
ATOM 1 N MET A 1 5.035 -11.508 21.797 1.00 68.75 N
ATOM 2 CA MET A 1 3.619 -11.172 21.688 1.00 68.75 C
ATOM 3 C MET A 1 2.748 -12.305 22.234 1.00 68.75 C
ATOM 4 CB MET A 1 3.250 -10.875 20.234 1.00 68.75 C
... ATOM 2421 CD2 HIS A 317 77.875 36.875 -55.031 1.00 40.16 C
ATOM 2422 ND1 HIS A 317 79.312 36.125 -56.500 1.00 40.16 N
ATOM 2423 CE1 HIS A 317 78.188 36.250 -57.125 1.00 40.16 C
ATOM 2424 NE2 HIS A 317 77.312 36.688 -56.250 1.00 40.16 N
TER 2425 HIS A 317
ENDMDL
END

Phreely commented 1 year ago

Here's one file. Indeed I found a sequence part in the "official" precomputed ebi alphafold files which is missing here in my file.

drlemmus commented 1 year ago

You are also missing a HEADER and a CRYST1 record. Adding those should be enough to run AlphaFill with your PDB file. Also please send a message to the ColabFold developers to ask them to write valid PDB files or, even better, Model-Cif/mmCIF.

Phreely commented 1 year ago

MAXIT (https://sw-tools.rcsb.org/apps/MAXIT/index.html) can be used to convert PDB to CIF supporting AlphaFill.

Thanks for the link - I'll have to see if I can get this up and running. I'm on a windows machine and the commands look to me more like Linux specific. Maybe I can check if I can run it in WSL.

Phreely commented 1 year ago

You are also missing a HEADER and a CRYST1 record. Adding those should be enough to run AlphaFill with your PDB file. Also please send a message to the ColabFold developers to ask them to write valid PDB files or, even better, Model-Cif/mmCIF.

Thanks for the info. I've opened an issue in the Colabfold git. In the meantime I'll read a bit about which part makes a header and which part would be a valid CRYST1 record. So far, just adding the sequence like below didn't work. # _entity_poly.entity_id 1 _entity_poly.nstd_linkage no _entity_poly.nstd_monomer no _entity_poly.pdbx_seq_one_letter_code
;MKRVVVDPISRIEGHLRVEIKVDEASGKVEDALSSGTAWRGIELVAKDRDPRDLWAFVQRICGVCTTTHALASLRAVEDA LGITIPKNANYIRNIMHSSLDVHDHIVHFYHLHALDWVSPVAALSADPAKTAQLQNDVLATYNVSGLAPAETASKDSAYP KEFPKATTAYFTAVQQKVKKIVESGQLGIFSAQWWDHPDYNLLPPEVHLMAVSHYLNILDRQRDIVIPHVVFGGKNPHPH YIVGGMPCSISMNDMNAPINTQRLAAVEQSIALTKDLVDKFYVPDLLAIGKIYVEKGMIDGGGLAKKRVMSYGDYPDDTY TGISNGDYHKKCIVRSNGVVENFALGVDKATFIPLEGKDFMDPQYLSEEVDHSWFTYPDGTKTLHPIEGVTDPKFTGPKS GTKEKWEFLDEDKKYSWIKSPTFKGKTAEVGPLAKYIVVYTKVKQGIIKDPTWAESMIVRQIDTVSQVLGVPAHVWMTTM VGRTACRGLDAQVAANISQYFFNKLVSNIKNGDTTVADMTKFEPNTWDKDAKGVGLVDAPRGGLGHWIHIKDGRSANYQC IVPSTWNACPKTAANEHGAYEDSMIDTHVKIADKPLEILKVIHSFDPCLACATHLYNKKGEKIVSVNTDALCK ; _entity_poly.pdbx_strand_id A _entity_poly.type polypeptide(L) #

drlemmus commented 1 year ago

Hacking an mmCIF file never works because of all the internal relations that are in the file. Fixing the PDB file would probably do the trick.

agdiaz commented 1 year ago

MAXIT (https://sw-tools.rcsb.org/apps/MAXIT/index.html) can be used to convert PDB to CIF supporting AlphaFill.

Thanks for the link - I'll have to see if I can get this up and running. I'm on a windows machine and the commands look to me more like Linux specific. Maybe I can check if I can run it in WSL.

Hello @Phreely! There is an available Docker image containing the Maxit tool. I hope this helps you to convert your pdb to mmcif format:

docker run -i tzok/maxit < input.pdb > output.cif

More details: https://registry.hub.docker.com/r/tzok/maxit

sadiogo commented 1 year ago

You can convert colabfold models to cif using this online tool (which uses MAXIT to perform the conversion):

https://mmcif.pdbj.org/converter/index.php?l=en

You can then upload the converted file in the alphafill.eu server to find transplant candidates.

However, the alphafill server doesn't seem to find all good transplant candidates. Perhaps because it is searching using sequence identity rather than structure similarity (foldseek may be a better option)? But even if we consider sequence identity, it doesn't seem to detect transplants from very similar sequences.

For example, I submitted a colabfold model and alphafill correctly found a structure with more than 60% sequence identity that provides GDP and MN transplants (1s4o.A). However, the same protein also has an alternative structure (1s4p.A) wherein GDP, MN and MMA (alpha-mannose) are bound. Alphafill does not suggest this structure as a transplant candidate, so I cannot transplant MMA. I checked the PDB-REDO database and saw that 1s4p is not in it, while 1s4o is. So this should explain why it was not detected. But why isn't 1s4p present in the PDB-REDO database? It didn't achieve filtering criteria, perhaps?

If I install alphafill locally, can I "force" it to transplant MMA from the 1s4p structure?

drlemmus commented 1 year ago

1s4p is in pdb-redo: https://pdb-redo.eu/db/1s4p The issue is that MMA is not on the list of potential transplants. You can add the compound in a local install.

sadiogo commented 1 year ago

Great! It's very nice to have the option to add compounds. But can I also force transplants from structures I know are similar but don't display 25% identity?

I am trying to install alphafill locally, but I'm currently stuck. I have posted the matter in a closed issue, as the initial error was identical: https://github.com/PDB-REDO/alphafill/issues/35#issuecomment-1733730810

drlemmus commented 1 year ago

Have a look at the command line options, the minimum identity is a setting, but caveat emptor.

sadiogo commented 1 year ago

I look forward to playing around with the configuration options when I get alphafill running. On that matter, what software is used to perform the structure superposition? The reference cited in the paper is quite theoretical and discusses quaternion-based solutions to superposition problem, but doesn't provide a pratical method of performing the superposition. Does alphafill perform the superposition step on its own?, Can this be tweaked?

drlemmus commented 1 year ago

AlphaFill does it by itself. You cannot tweak it.

mhekkel commented 1 year ago

I look forward to playing around with the configuration options when I get alphafill running. On that matter, what software is used to perform the structure superposition? The reference cited in the paper is quite theoretical and discusses quaternion-based solutions to superposition problem, but doesn't provide a pratical method of performing the superposition. Does alphafill perform the superposition step on its own?, Can this be tweaked?

The code to do the alignment is here:

https://github.com/PDB-REDO/libcifpp/blob/641f06a7e7c0dc54af242b373820f2398f59e7ac/src/point.cpp#L206

So you need to edit libcifpp if you want to change that algorithm. Or you can edit the source code for alphafill itself if you want to modify the input of this algorithm.