Open GwennyGit opened 1 year ago
I just detected that currently the function add_reac
in the polish
module does not handle the Growth or Biomass reaction properly. This is due to the fact that current_id is shortened - which is necessary for all other reactions - like that: current_id = current_id[2:]
. However, if the ID of the reaction is for example 'Growth' the result for current_id
will be 'owth'. Thus, the function cannot handle these IDs with the regex 'growth|_*biomass\d*_*'
properly. The code needs to be adjusted to handle these exceptions correctly.
At least in my models the growth function has the id id="R_Growth"
which would mean that removing the trailing R is still valid and would not break the regex function. We might need to insert a check whether the trailing R is really there before we try to remove it. Or am I missing something?
# Get ID and remove 'R_'
current_id = entity.getId()
if current_id[:2] == 'R_':
current_id = current_id[2:]
Regarding the ID for growth, this seems to vary between models a lot. We could also just change the regex to include R_
→ R_?growth|R_?_*biomass\d*_*
. However, the proposed change is also good and might be better as other reactions could also exist without R_
.
Looking at your changes I realised one problem with the function change_all_qualifiers
. If one has a lab strain model the GeneProtein qualifier should be set to isHomologTo
as we discussed before.
As mentioned in these comments https://github.com/draeger-lab/refinegems/issues/52#issuecomment-1426221386 and https://github.com/draeger-lab/refinegems/issues/52#issuecomment-1426225397. It would indeed make sense to add compartment specifications for all reactions which happen in the same compartment within polish
.
COBRApy manipulates IDs back and forth. This prefix is only there in the SBML export. Upon import R_
is stripped away.
In libsbml
the trailing R is not stripped though. That is sometimes confusing. (The same is true for genes and metabolites).
Of course not. This entire ID manipulation is nonsense and causes chaos. There should be separate fields for every piece of information that the BiGG ID ships.
I think the R_
for reactions is used to clarify that this is a reaction ID within the model. R_
is not part of the BiGG IDs same as M_
for metabolites/species and G_
for groups and GeneProducts.
But for the polishing module we mostly use libsbml
, I think we only even use cobrapy
for the growth simulations (and a few more small things). So we always need to take care of those trailing IDs (but through string slicing that is not an issue I think). One could argue that they are obsolete since the entity itself holds the information already - but that is something we cannot change here (or shouldn't on our own).
While investigating models from Heinken et al.[1] I realised that improving the function get_curie_set
within the polish
module could be interesting. As these models contain NaN identifiers and the function get_curie_set
assembles a dictionary mapping the database of each CURIE to the correspondingly found identifier it would be helpful to also check for NaNs and remove the according entries.
[1] Heinken, A., Hertel, J., Acharya, G. et al. Genome-scale metabolic re- construction of 7,302 human microorganisms for personalized medicine. Nature Biotechnology. issn: 1546-1696. doi:10.1038/s41587-022-01628-0 (Jan. 2023).
This issue was opened to collect all enhancements for the
polish
module.biomass.test_biomass_presence
add_reac
to handlegrowth|_*biomass\d*_*
IDs properly