draeger-lab / refinegems

refineGEMs is a python package inteded to help with the curation of genome-scale metabolic models (GEMS).
https://refinegems.readthedocs.io/en/latest/
MIT License
10 stars 1 forks source link

InChI string annotation format broken after refineGEMs.polish #89

Closed GwennyGit closed 1 year ago

GwennyGit commented 1 year ago

The bug After applying refineGEMs.polish to several models containing InChI-Strings in the annotations, I realised that COBRApy gave warnings that some of the URIs containing InChI-Strings were wrongly formatted. Checking out the files revealed that some InChI-String URIs only contained parts of the original InChI-String or in other cases instead of inchi:InChI=1S/ the URI contained inchi:InChI=1S: which is incorrect.

GwennyGit commented 1 year ago

I investigated this problem further and realised the following two causes:

  1. The function cv_notes_metab seems to not recognise the InChI-Strings within the notes section of the model with the current implementation.
  2. The function get_set_of_curies changes inchi:InChI=1S/ to inchi:InChI=1S:. This is because the CURIs are checked and split to obtain a mapping from the database prefix to the corresponding local unique identifier, in this case, the InChI-String.