cdanielmachado / carveme

CarveMe: genome-scale metabolic model reconstruction
Other
152 stars 52 forks source link

Detailed info regarding gapfilling #109

Closed franciscozorrilla closed 3 years ago

franciscozorrilla commented 3 years ago

Hi Daniel,

I was wondering if it would be possible for CarveMe to output more detailed info regarding the metabolties/reactions that were introduced by gapfilling. For example, shown below is the output log for a model. It says that the gapfilling added 40 reactions and 21 metabolites, but I cannot tell what those mets/rxns were, short of re-carving without gapfilling and then diff'ing the models. Is this how you identified gapfilled reactions in the original carveme paper?

(carve) [fz274@login-e-10 test_carve]$ carve -g M8 -v --mediadb media_db.tsv --fbc2 -o 56.xml 56.fna 
Running diamond...
diamond blastp -d /home/fz274/.local/lib/python3.7/site-packages/carveme/data/generated/bigg_proteins.dmnd -q 56.fna -o 56.tsv --more-sensitive --top 10
diamond v2.0.7.145 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org

#CPU threads: 32
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: 
Opening the database...  [0.21s]
Percentage range of top alignment score to report hits: 10
Reference = /home/fz274/.local/lib/python3.7/site-packages/carveme/data/generated/bigg_proteins.dmnd
Sequences = 84674
Letters = 32226457
Block size = 2000000000
Opening the input file...  [0.004s]
Opening the output file...  [0.004s]
Loading query sequences...  [0.001s]
Masking queries...  [0.033s]
Building query seed set...  [0s]
The host system is detected to have 201 GB of RAM. It is recommended to use this parameter for better performance: -c1
Algorithm: Double-indexed
Building query histograms...  [0.008s]
Allocating buffers...  [0.001s]
Loading reference sequences...  [0.169s]
Masking reference...  [0.135s]
Initializing temporary storage...  [0.01s]
Building reference histograms...  [0.353s]

........

Processing query block 1, reference block 1/1, shape 16/16, index chunk 4/4.
Building reference seed array...  [0.028s]
Building query seed array...  [0.004s]
Computing hash join...  [0.005s]
Building seed filter...  [0.003s]
Searching alignments...  [0.001s]
Deallocating buffers...  [0.007s]
Clearing query masking...  [0s]
Computing alignments...  [0.086s]
Deallocating reference...  [0.003s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0s]
Loading query sequences...  [0s]
Closing the input file...  [0s]
Closing the output file...  [0.002s]
Closing the database file...  [0.004s]
Deallocating taxonomy...  [0s]
Total time = 4.676s
Reported 6062 pairwise alignments, 6062 HSPs.
323 queries aligned.
The host system is detected to have 201 GB of RAM. It is recommended to use this parameter for better performance: -c1
Loading universe model...
Loading media library...
Scoring reactions...
Reconstructing a single model
Gap filling for M8...
Added 40 reactions and 21 metabolites
Done.

Best wishes, Francisco

cdanielmachado commented 3 years ago

Hi Francisco,

The gap-filled reactions have a tag in the notes field in the SBML file. In your case, this tag will be "GAP_FILL: M8". Also, they will be the last reactions in the file, right after the biomass reaction.

If you load the model in reframed, you can get this tag by inspecting reaction.metadata for each reaction. Also in this case, they will be the last reactions in the model (reframed preserves the reaction order in the file).

By the way, I just realized I am not tagging the metabolites, but they will also be the last in the metabolite list.

Best regards, Daniel

cdanielmachado commented 3 years ago

PS: Are you showing off your RAM? 😄

franciscozorrilla commented 3 years ago

Hi Daniel,

Ahh I was grepping the model files for gap and fill but forgot to use the case sensitive flag -i, so I wasn't finding the tags before. Thanks for the explanation!

Oops, didn't mean to RAM brag but Cambridge cluster does have more powerful login nodes than at EMBL 💪

Best wishes, Francisco

cdanielmachado commented 3 years ago

:)

PolTorrentTornos commented 3 years ago

Hi All, I didn't get GAP_FILL tags when I didn't specify any medium (-g M9,LB...), only when I did specify them. I'm assuming that even if I don't specify any growth medium CarveMe does gap-filling. How can I then know which reaction have been added as for gap-filling?

Best, Pol.

franciscozorrilla commented 3 years ago

Hi Pol, Daniel likely has a better response, but you could probably identify which reactions were gap-filled by searching for metabolic reactions without gene annotations.

PolTorrentTornos commented 3 years ago

Hi Francisco, thanks for your idea! I'll follow your approach for now :)