brsynth / rptools

Suite of tools that work on rpSBML format
MIT License
7 stars 2 forks source link

Compounds not included in the Sink #26

Closed niraito closed 2 years ago

niraito commented 2 years ago

Hello,

I was trying to extract sink from different sbml models. When I compared the sink extracted by rptools and the compounds with MNXM IDs I extracted from the model manually, I noticed that some cytosolic compounds are not included in the sink by rptools. This might be due to there is no flux for these compounds. And, --remove_dead_end option is True as default. However, I could not make it work assigning it to any form of false.

python -m rptools.rpextractsink iJO1366.xml iJO1366_rmdeadends_false.csv --remove_dead_end False
python -m rptools.rpextractsink iJO1366.xml iJO1366_rmdeadends_false.csv --remove_dead_end FALSE
python -m rptools.rpextractsink iJO1366.xml iJO1366_rmdeadends_false.csv --remove_dead_end F
python -m rptools.rpextractsink iJO1366.xml iJO1366_rmdeadends_false.csv --remove_dead_end false
python -m rptools.rpextractsink iJO1366.xml iJO1366_rmdeadends_false.csv --remove_dead_end f

The response was always this depending on the "false" argument:

usage: rpextractsink [-h] [--log ARG] [--log_file LOG_FILE] [--silent] [--version] [--compartment_id COMPARTMENT_ID]
                     [--remove_dead_end]
                     input_sbml output_sink
rpextractsink: error: unrecognized arguments: f

How can I try it without removing dead ends?

I have another related question: I thought the compounds that are not included in the sink might be deprecated. RetroRules uses MNXref v3.0. The current release of MetaNetX is MNXref 4.4. Does rptools also use MNXref v3.0? There are many deprecation incidents with each new release. When I go to the MetaNetX page of a compound in the pathway resulting from RetroPath2.0, it is highly probable to see the compound is deprecated. I thought that I could update the compounds to their newest MNXM IDs. But the deprecations with each new release cause paths of IDs for the same compound.

unnamed

The mappings are "one-to-one, many-to-one (merge) and one-to-many (split)" as described in the Search/Download MNXref namespace page, and also can be seen below.

unnamed_zoomin

Sometimes I observe that a compound deprecates into something very different (in my opinion). I asked about this issue to MetaNetX with an e-mail. But what do you think about the deprecations and how does it affect the pathway predictions? Does it have a relation with the reason that some compounds are not included in the sink formed by rptools.extractsink?

tduigou commented 2 years ago

Hello @niraito,

To include dead-end metabolites, one needs to not write the option --remove_dead_end in the command line call. Taking the example you provide, it would be:

python -m rptools.rpextractsink iJO1366.xml iJO1366_rmdeadends_false.csv

Regarding the second part of your question, the sink extraction (as well as the dead end detection) is independant of MetaNetX. Basically, the sink extraction relies only on the SBML model, by listing the metabolites belonging to a given compartment (by defaut, the 'c' compartment, which corresponds to the cytosol in E. coli). For the dead-end detection, Flux Balance Analysis are used.

The MetaNetX deprecated relationships does not affect the pathway predictions, in the sens that, for a given set of reaction rules and sink, the predicted reactions and pathways remain the same than 3 years before (this is because we are using chemical structures to work, not IDs).

However, it's true that a lot of deprecated warnings now popup when browsing the MNX IDs, and make more difficult to explore results... (Releasing a new datasets is something plan in the future).

Regards, Thomas

niraito commented 2 years ago

Hello again, dear @tduigou,

Thank you for helping me with the --remove_dead_end argument :)

However, I still have another question:

python -m rptools.rpextractsink iJO1366.xml iJO1366_sink_20221018.csv --compartment_id c
# number of cytosolic compounds: 808 (without header)
wc -l iJO1366_sink_20221018.csv

809 iJO1366_sink_20221018.csv

python -m rptools.rpextractsink iJO1366.xml iJO1366_sink_20221018_rmde.csv --compartment_id c --remove_dead_end
# number of cytosolic compounds, dead ends removed: 705 (without header)
wc -l iJO1366_sink_20221018_rmde.csv

706 iJO1366_sink_20221018_rmde.csv

From the sbml file, I can extract cytosolic compounds (by MNXM ids, not according to their chemical structures) manually:

# to extract list of species section: 
cat iJO1366.xml | sed '/listOfSpecies/,$!d' > iJO1366_listOfSpecies_sed_intro.tmp
tac iJO1366_listOfSpecies_sed_intro.tmp | sed '/\/listOfSpecies/,$!d' | tac > iJO1366_listOfSpecies_sed_outro.tmp
cat iJO1366_listOfSpecies_sed_outro.tmp | grep 'MNXM' | sort -u | wc -l

1136

There are 1136 unique compounds or MNXM ids.

cat iJO1366_listOfSpecies_sed_outro.tmp | grep 'compartment="c"' | wc -l

1039

1039 compounds (or MNXM ids) out of 1136 are cytosolic.

But in the sink, there were 808 cytosolic compounds (or MNXM ids) (or 705, when dead ends are removed). Why do you think there is this difference? Could you help me find the reason I got them different in number?

Thank you for your time and patience!

Best regards, Nilay

tduigou commented 2 years ago

Hi @niraito,

The reason why there are fewer compound is that only compounds having an InChI structure is outputted into the sink file.

The inchi is assigned by looking into the MNX database, using the MNX ID as the query. (Here, I realized I was mistaken when I was saying that the sink extraction was independent of the MNX database.) The MNX database is actually used for looking for this MNX ID to inchi relationship.

Best wishes, Thomas

niraito commented 2 years ago

Hello again, dear @tduigou,

I checked the InChI columns of the sinks both extracted by rptools and the one that I retrieved from the sbml file (all MNXM IDs). And I saw that some of the MNXM compounds that I retrieved from the sbml file do not have InChI structures while all compounds in the sink extracted by rptools have InChI structures. Thank you for the explanation.

Also, I would like to use the output scopes of RetroPath2.0 in further steps of my research. What would you suggest to do about the deprecated compounds for now?

Thank you for your time and patience!

Best regards, Nilay

tduigou commented 2 years ago

Hi @niraito,

I would say it all depends of what will be done after RP2. Of course, if only chemical structure provided by the inchi matters, not the MNX IDs, then the status of deprecated are not important. If the MNX IDs are needed to establish crosslinks with other databases, then playing with the chem_xref files provided by MNX might be usefull find the links between deprecated and today's IDs.

Best wishes, Thomas

niraito commented 2 years ago

Dear @tduigou,

Thank you for your time, patience, and guidance! I will need the crosslink information. Therefore, I will work on the chem_xref and other files from MNX.

Best regards, Nilay