draeger-lab / refinegems

refineGEMs is a python package inteded to help with the curation of genome-scale metabolic models (GEMS).
https://refinegems.readthedocs.io/en/latest/
MIT License
10 stars 1 forks source link

`polish` needs to define default units and compartment structures #38

Closed famosab closed 1 year ago

famosab commented 1 year ago

polish_carveme sets units for all parameters to mmol_per_gDW_per_hr which is in accordance with the standard understanding of the results of Flux Balance Analysis. However the listOfUnitDefinitions might be missing in a model. The function in polish_carveme thus needs to be extended to include the insertion of the fragment shown below.

`

`

draeger commented 1 year ago

Actually, we need more than that.

The model element should define default units:

extentUnits="mmol_per_gDW" substanceUnits="mmol_per_gDW" timeUnits="h" volumeUnits="fL"

And the following units should be declared for constraint-based models:

<listOfUnitDefinitions>
      <unitDefinition id="h" metaid="meta_h" name="hour">
        <annotation>
          <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
            <rdf:Description rdf:about="#meta_h">
              <bqbiol:is>
                <rdf:Bag>
                  <rdf:li rdf:resource="https://identifiers.org/UO:0000032" />
                </rdf:Bag>
              </bqbiol:is>
            </rdf:Description>
          </rdf:RDF>
        </annotation>
        <listOfUnits>
          <unit exponent="1" kind="second" multiplier="3600" scale="0" />
        </listOfUnits>
      </unitDefinition>
      <unitDefinition id="fL" metaid="meta_fL" name="femto litres">
        <annotation>
          <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/">
            <rdf:Description rdf:about="#meta_fL">
              <bqbiol:is>
                <rdf:Bag>
                  <rdf:li rdf:resource="https://identifiers.org/UO:0000104" />
                </rdf:Bag>
              </bqbiol:is>
            </rdf:Description>
          </rdf:RDF>
        </annotation>
        <listOfUnits>
          <unit exponent="1" kind="litre" multiplier="1" scale="-3" />
        </listOfUnits>
      </unitDefinition>
      <unitDefinition id="mmol_per_gDW" name="millimoles per gram dry weight">
        <listOfUnits>
          <unit exponent="1" kind="mole" multiplier="1" scale="-3" />
          <unit exponent="-1" kind="gram" multiplier="1" scale="0" />
        </listOfUnits>
      </unitDefinition>
      <unitDefinition id="mmol_per_gDW_per_hr" name="millimoles per gram dry weight per hour">
        <listOfUnits>
          <unit exponent="1" kind="mole" multiplier="1" scale="-3" />
          <unit exponent="-1" kind="gram" multiplier="1" scale="0" />
          <unit exponent="-1" kind="second" multiplier="3600" scale="0" />
        </listOfUnits>
      </unitDefinition>
    </listOfUnitDefinitions>

In addition, every parameter that is used as a flux bound needs to declare its unit units="mmol_per_gDW_per_hr".

draeger commented 1 year ago

Compartments can only load the default unit automatically if they specify their structure using those attributes: size="NaN" spatialDimensions="3". All species should define an initialAmount="NaN" unless some initial concentration is already defined.

famosab commented 1 year ago

Ok so polish_carveme needs to be extended even further to include all of these specifications it seems. However only for validity the part with the uni_definitions seems to be the most urgent to correct.

GwennyGit commented 1 year ago

The requested and required aforementioned functionalities were added to the polish module and most parts of the module were adjusted to be more general.


Currently, the code is working for CarveMe models and models polished with ModelPolisher. Additionally, the following functions were tested and functioned properly with a model from the VMH website. A recommendation would be to use the following functions of the polish module after using ModelPolisher.

    add_fba_units(model)
    set_default_units(model)
    set_units(model)
    add_compartment_structure_specs(model)
    set_initial_amount(model)

Possible improvement:

Generalise the check for existing unit definitions of a model

GwennyGit commented 1 year ago

The functionality to replace existing identifiers with BiGG IDs was not working properly. Additionally, this replacement step was deemed unnecessary. Thus, the already existing code parts were removed.


The previous pushed code was transferred from the development branch to the feature/polish-unit-update branch and has been finished. As the implementation so far works as described in the last comment, the branch feature/polish-unit-update can now be merged.

GwennyGit commented 1 year ago

@draeger commented:

Do we have any consens now regarding the units of compartments, i.e., should we use fL or dimensionless units?

I would like to continue the discussion regarding the compartment's unitstag here.


I think the consensus was that a model is a knowledge-base and, hence, it should contain as much information as possible. However, for the units of compartments. I think the definition unitless would make more sense and I agree with @famosab that if it is not used either way I am unsure why it should even be there. I mean dimensionless is obviously not true as a compartment has a dimension. Furthermore, I think that is the reason why in the model where I used the new version of the polish module the units tag was not added by ModelPolisher. I could imagine that ModelPolisher checks whether the tag spatialDimension is set before adding units = dimensionless.
One could argue that using gram (dry weight) (also mentioned at: https://github.com/draeger-lab/ModelPolisher/issues/8#issue-136922129) or femto litres as units for the compartments would make sense as most compartments are filled with water and can thus also be dried. However, for the extracellular compartment it is more difficult as this could be for example the desert which contains barely any fluids, and I am unsure of using the unit gram (dry weight) here. Maybe one could state as consensus that for all compartments except the extracellular one either femto litres or gram (dry weight) would be the correct units. Though, the question of what to state as units for the extracellular compartment would still remain as it is difficult to state one unit.
After further discussion with @draeger we decided to use femto litres as unit for all compartments. This is based on the fact that compartments have a volume and volumes can be expressed in litres.

GwennyGit commented 1 year ago

So far the code was readjusted to handle changes in the list of unit definitions of the input model without breaking the model. Additionally, the unit definitions as defined within the code were declared as ground truth. Thus, unit definitions contained in the input model that are not identical with any of the unit definitions defined within the code are removed from the model. Additionally, these unit definitions are printed to the console as XMLNode objects and the unit definitions defined within the code are added to the model.