C-CoMP-STC / GEM-mit1002

Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Why does MEMOTE give different numbers of energy generating cycle reactions? #41

Closed hgscott closed 7 months ago

hgscott commented 8 months ago

The first time I generated a MEMOTE report with the erroneous energy generating cycle tests not all being skipped (but my catch for if a metabolite was present or not wasn't working, so nothing skipped, just errored), I got that there were >150 reactions involved in ATP generating cycles. Image

Then, when I fixed that if statement bug (so metabolites skipped rather than erroring if they aren't present), I got that there were 84 reactions involved in ATP generating cycles: Image And I continued to get that when I ran my GitHub action several times.

Then I merged by ms2-model branch into main, and got >170 reactions: Image

And when I try to run MEMOTE locally (using the pip installable MEMOTE- not my branch, so I'm just using running the function once for ATP, I get a lot fewer reactions): Image

hgscott commented 8 months ago

Just as a sanity check, I made sure the model files were the same on the main branch and on the cycles branch, which was giving two different numbers in the MEMOTE report.

When I checked the diff, there was no difference in the model file: image

hgscott commented 8 months ago

To be able to test this more easily (and not have to run GitHub actions for everything) I made a new conda environment called my-memote that runs MEMOTE (specifically and editable install from my local copy).

hgscott commented 8 months ago

I tried just using the consistency.detect_energy_generating_cycles function directly with this code:

import cobra
import memote.support.consistency as consistency

# Load the model
model = cobra.io.read_sbml_model("model.xml")

met = "MNXM3"  # ATP

print("Finding ATP generating cycles")
print("----------------------------")
# For a specfied number of times
for i in range(0, 10):
    # Make a copy of the model
    model_copy = model.copy()
    # Run just the function for finding ATP generating cycles
    rxns = consistency.detect_energy_generating_cycles(model_copy, met)
    # Print the results
    print(rxns)

But each run gave the same results: image

hgscott commented 8 months ago

Then I ran the whole snapshot generation command 10 times with this code:

import subprocess

# Loop the number of times you want to run memote
for i in range(0, 10):
    # Define the output file name
    output_file = "report_" + str(i) + ".html"

    # Define the command as a list of strings
    command = [
        "memote",
        "report",
        "snapshot",
        "--filename",
        output_file,
        "model.xml",
    ]

    # Run the command using subprocess
    process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    # Wait for the process to finish
    stdout, stderr = process.communicate()

And got a whole range of values (all much higher than 3):

hgscott commented 8 months ago

Daniel say to:

hgscott commented 8 months ago

For easier debugging I had to add a few things to various__init__.py files to be able to actually run the snapshot function from python:

Import the cli module as a part of MEMOTE: Image

And import reporting as a a part of the cli module (has to go after the CONTEXT_SETTINGS) definition to avoid circular imports): Image

Then I could generate the snapshot report HTML with:

import memote

memote.suite.cli.reports.snapshot(["../GEM-repos/mit1002-model/model.xml"])

When I put debug breakpoints I verified that the snapshot function was doing what I thought it was, but that the majority of reactions found in the detect_energy_generating_cycles function had a tiny flux (10^-48 tiny).

When I increased the flux > 0 to flux > 0.001, I got the same three reactions I got when I ran just the detect_energy_generating_cycles function locally.

Not sure how that gave different results when running the snapshot function versus running just that one function, but I think that may just be down in the actual CS weeds of how computers handle large numbers.

hgscott commented 8 months ago

I changed the code to use the TOLERANCE_THRESHOLD variable which was already set and being used in other consistency checks: Image