PDB-REDO / alphafill

AlphaFill is an algorithm based on sequence and structure similarity that “transplants” missing compounds to the AlphaFold models. By adding the molecular context to the protein structures, the models can be more easily appreciated in terms of function and structure integrity.
https://alphafill.eu
BSD 2-Clause "Simplified" License
90 stars 18 forks source link

UNL ligand crash #19

Closed mf-rug closed 1 year ago

mf-rug commented 1 year ago

Hi,

might this be a bug? In one of the structures I'm trying to fill alphafill just aborts after this:

Error when processing 3f0h for nohd
 >> Trying to insert unknown compound UNL (not found in CCD)

in structure 3f0h, there is indeed an unknown ligand (UNL) annotated (looks a bit like a mistake in that case) As UNL is of course not an actual ligand, it seems like a more appropriate behaviour would be to ignore UNLs? At least it shouldn't crash.

mhekkel commented 1 year ago

UNL is a valid entry in components.cif. You reported before that your components.cif is not installed properly, hence this error stems from that issue. Please try to install components.cif and you'll notice this error is gone.

mf-rug commented 1 year ago

How does it make sense that UNL is a valid entry? It's not a defined molecule, always a different one depending on the structure. Which of the 461 different UNLs in the pdb is in components.cif? Also, why does alphafill crash for me only for this ligand and works fine for the rest? (I installed libcifpp according to the instructions, and I have copied components.cif in the dirs you indicated in the other thread)

mhekkel commented 1 year ago

components.cif contains:

data_UNL
#

_chem_comp.id                                   UNL
_chem_comp.name                                 "Unknown ligand"
_chem_comp.type                                 NON-POLYMER
_chem_comp.pdbx_type                            HETAIN
_chem_comp.formula                              ?
_chem_comp.mon_nstd_parent_comp_id              ?
_chem_comp.pdbx_synonyms                        ?
_chem_comp.pdbx_formal_charge                   0
_chem_comp.pdbx_initial_date                    2008-04-10
_chem_comp.pdbx_modified_date                   2008-12-05
_chem_comp.pdbx_ambiguous_flag                  Y
_chem_comp.pdbx_release_status                  REL
_chem_comp.pdbx_replaced_by                     ?
_chem_comp.pdbx_replaces                        ?
_chem_comp.formula_weight                       ?
_chem_comp.one_letter_code                      ?
_chem_comp.three_letter_code                    UNL
_chem_comp.pdbx_model_coordinates_details       ?
_chem_comp.pdbx_model_coordinates_missing_flag  N
_chem_comp.pdbx_ideal_coordinates_details       ?
_chem_comp.pdbx_ideal_coordinates_missing_flag  N
_chem_comp.pdbx_model_coordinates_db_code       ?
_chem_comp.pdbx_subcomponent_list               ?
_chem_comp.pdbx_processing_site                 RCSB
##

And crashing is a bit harsh. It stops and tells you that UNL is not found in the CCP4 monomers library. Which is correct. If you had components.cif, the program would have continued.

Question, did you specify a CMAKE_INSTALL_PREFIX when building libcifpp? If so, the library is using that as location for the data files it is looking for.

mf-rug commented 1 year ago

I didn't specify CMAKE_INSTALL_PREFIX. But the libcifpp site says "The default is to install everything in $HOME/.local on Linux ", which is also where the components.cif is in my case: ~/.local/share/libcifpp/components.cif

$ grep -A 27 data_UNL ~/.local/share/libcifpp/components.cif
data_UNL
#

_chem_comp.id                                   UNL
_chem_comp.name                                 "Unknown ligand"
_chem_comp.type                                 NON-POLYMER
_chem_comp.pdbx_type                            HETAIN
_chem_comp.formula                              ?
_chem_comp.mon_nstd_parent_comp_id              ?
_chem_comp.pdbx_synonyms                        ?
_chem_comp.pdbx_formal_charge                   0
_chem_comp.pdbx_initial_date                    2008-04-10
_chem_comp.pdbx_modified_date                   2008-12-05
_chem_comp.pdbx_ambiguous_flag                  Y
_chem_comp.pdbx_release_status                  REL
_chem_comp.pdbx_replaced_by                     ?
_chem_comp.pdbx_replaces                        ?
_chem_comp.formula_weight                       ?
_chem_comp.one_letter_code                      ?
_chem_comp.three_letter_code                    UNL
_chem_comp.pdbx_model_coordinates_details       ?
_chem_comp.pdbx_model_coordinates_missing_flag  N
_chem_comp.pdbx_ideal_coordinates_details       ?
_chem_comp.pdbx_ideal_coordinates_missing_flag  N
_chem_comp.pdbx_model_coordinates_db_code       ?
_chem_comp.pdbx_subcomponent_list               ?
_chem_comp.pdbx_processing_site                 RCSB
##