PDB-REDO / alphafill

AlphaFill is an algorithm based on sequence and structure similarity that “transplants” missing compounds to the AlphaFold models. By adding the molecular context to the protein structures, the models can be more easily appreciated in terms of function and structure integrity.
https://alphafill.eu
BSD 2-Clause "Simplified" License
89 stars 16 forks source link

mmCIF dictionary #10

Closed multimeric closed 2 years ago

multimeric commented 2 years ago

When I run cif-validate (or indeed any other cif-tools binary) on AlphaFill files, I get a number of validation errors. I wonder if you're using a custom dictionary? If yes, can you provide it so that these tools work properly?

As an example:

$ curl -L https://alphafill.eu/v1/aff/P04406-F1 > P04406-F1.cif                                                                                              (base) 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  354k  100  354k    0     0   151k      0  0:00:02  0:00:02 --:--:--  151k
$ cif-validate --verbose P04406-F1.cif                                                                                                                       (base) 
Error validating database_id: When validating _database_2.database_id: Value 'AF' is not in the list of allowed values
undefined Category ma_data
undefined Category ma_model_list
undefined Category ma_qa_metric
undefined Category ma_qa_metric_global
undefined Category ma_qa_metric_local
undefined Category ma_software_group
undefined Category ma_target_entity
undefined Category ma_target_entity_instance
undefined Category ma_target_ref_db_details
missing mandatory field dict_version for Category audit_conform
Error validating database_id: When validating _database_2.database_id: Value 'AF' is not in the list of allowed values
undefined Category ma_data
undefined Category ma_model_list
undefined Category ma_qa_metric
undefined Category ma_qa_metric_global
undefined Category ma_qa_metric_local
undefined Category ma_software_group
undefined Category ma_target_entity
undefined Category ma_target_entity_instance
undefined Category ma_target_ref_db_details
missing mandatory field dict_version for Category audit_conform
CPU usage: 0.8s user, 0.0s system, 0.0s wall
mhekkel commented 2 years ago

Thanks for that question! Never took the time to investigate this. Anyway, the dictionary can be found at:

https://github.com/ihmwg/ModelCIF

And when using this I see that the AlphaFold files are not valid at all. The key for ma_qa_metric_local is ordinal_id and this field contains the value 1 for all records. Obviously that's wrong. I guess I will have to report this as a bug at AlphaFold.

multimeric commented 2 years ago

Thanks! That's still helpful, I should be able to just adjust the alphafold CIFs to make them conform to your dictionary.

multimeric commented 2 years ago

Okay so actually the new alphafold structures don't have this issue. The v3 ones in AlphaFold DB have the correct ordinal_id:

$ curl 'https://alphafold.ebi.ac.uk/files/AF-P00519-F1-model_v3.cif' | grep -A 10 'ma_qa_metric_local'                                                      _ma_qa_metric_local.label_asym_id
_ma_qa_metric_local.label_comp_id
_ma_qa_metric_local.label_seq_id
_ma_qa_metric_local.metric_id
_ma_qa_metric_local.metric_value
_ma_qa_metric_local.model_id
_ma_qa_metric_local.ordinal_id
A MET 1    2 31.65 1 1    
A LEU 2    2 35.05 1 2    
A GLU 3    2 36.20 1 3    
A ILE 4    2 52.07 1 4    
A CYS 5    2 39.89 1 5    
A LEU 6    2 47.68 1 6    
A LYS 7    2 45.71 1 7    
A LEU 8    2 38.50 1 8    
A VAL 9    2 42.44 1 9    
A GLY 10   2 31.30 1 10   

So if you guys rerun the alphafill analysis at some point, this should be fixed.