gcremers / metascan

Metabolic scanning and annotation of Metagenomes
GNU General Public License v3.0
9 stars 1 forks source link

Explanation of Outputs #5

Open amcomeau opened 1 week ago

amcomeau commented 1 week ago

Hello again, I closed the previous issue I was having as I feel we resolved this - the program ran now after I did a first assembly step to reduce the complexity of a full MGS file of raw data.

Now I'd like to discuss the output (now that I have some!) - could you give a brief overview of what all the files are? The upper set of them appear to be for Krona graphs, but what are all the sub-extension/versions? The metagenome.tsv appears to be all the hits summarized:

output

And then when I look into the total.tsv, could you describe exactly what the numbers are? I'm assuming total # of hits, which could be different from # of contigs (if multiple copies), then number of organisms found in, which here is 1 since I gave it only one FASTA file (even if multiple contigs in file from one sample)? The %gene seems a bit high if that is simply the 232 divided by the total # of genes found in all the contigs - would imply only about 10,000 genes to get about 2%:

output2

Thanks!

gcremers commented 4 days ago

I have created the following overview (see below), which I will also include in the frontpage.

First off, the krona files should have been deleted after the end of the analysis (except for the krona.html file). The others are intermediate files to create the krona plot. I'm not sure what happened there, unless the --debug flag was on or if they are left-overs from the run that crashed.

The number of 10.000 genes that your referring to would be the total number of key-genes that were found by Metascan. Not the total number of every gene present. So the numbers in the total.ovw file give an estimate of the potential of the genes or processes within the larger metabolic pathways in a sample.

I tried to explain the numbers in the total.ovw file in the explanation below. I hope it makes sense. if not, let me know!

Total gene count = 6 (3(abb) + 2(aa) +1(b))

Total organism count = 3 (1(I)+1(II)+1(III))

Total depth = 11 (6+3+2)

Total gene depth = 26 ( (3x6) + (2x3) + (1x2))

this would yield the following outcome:

gene N#gene %gene N#org %Org O-Depth %O-Depth G-Depth %G-Depth
a 3 (1+2+0) 50% (3/6) 2 (1+1+0) 66% (2/3) 9 (6+3+0) 81.2% (9/11) 12 (6+(3+3)+0) 46.2 (12/26)
b 3 (2+0+1) 50% (3/6) 2 (1+0+1) 66% (2/3) 8 (6+0+2) 72.7% (8/11) 14 ((6+6)+0+2) 53.9 (14/26)

Besides the generic overview files, Metascan creates a number of files for each bin/(meta)genome/fasta file.

amcomeau commented 2 days ago

OK thanks for these explanations - I'm going to be going through them this week (one minor thing to fix is that you should have X.table for the extension above). In the meantime, I assume the fact I still have the extra Krona files is due to an error at the end of running that wasn't able to produce the final Krona chart (I don't have the final krona.html file):

[13:10:35] Annotation finished successfully.
[13:10:35] Walltime used: 2185.72 minutes
[13:10:35] If you use this result please cite the Metascan paper:
doi:https://doi.org/10.3389/fbinf.2022.861505
[13:10:35] This script is based on: Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics. 30(14):2068-9.
[13:10:35] Type 'prokka --citation' for more details.
[13:10:35] ************************************
[13:12:33] Deleting unwanted file: input_fasta_assembled//analyzedfastas.txt
[13:12:33] Deleting unwanted file: input_fasta_assembled//gensum.txt
[13:12:33] Deleting unwanted file: input_fasta_assembled//gendepthsum.txt
[13:12:33] Deleting unwanted file: input_fasta_assembled//orgdepthsum.txt
[13:12:33] Deleting unwanted file: input_fasta_assembled//keggsum.txt
[13:12:33] Deleting unwanted file: input_fasta_assembled//file_hash.txt
[13:12:33] Deleting unwanted file: input_fasta_assembled//file_locus01.txt
[13:12:33] Deleting unwanted file: input_fasta_assembled//file_locusVQ.txt
[13:12:33] Deleting unwanted file: input_fasta_assembled//file_contid.txt
[13:12:33] Deleting unwanted file: input_fasta_assembled//file_idcont.txt
[13:12:33] Running: ktImportText input_fasta_assembled\/\/krona\.g\.tsv\,Genes input_fasta_assembled\/\/krona\.gd\.tsv\,Gene\ Depth input_fasta_assembled\/\/krona\.o\.tsv\,Organisms input_fasta_assembled\/\/krona\.od\.tsv\,Organism\ Depth input_fasta_assembled\/\/krona\.mod\.g\.tsv\,Modules\ Genes input_fasta_assembled\/\/krona\.mod\.gd\.tsv\,Modules\ Gene\ Depth input_fasta_assembled\/\/krona\.mod\.o\.tsv\,Modules\ Organisms input_fasta_assembled\/\/krona\.mod\.od\.tsv\,Modules\ Organism\ Depth input_fasta_assembled\/\/krona\.proc\.g\.tsv\,Process\ Genes input_fasta_assembled\/\/krona\.proc\.gd\.tsv\,Process\ Gene\ Depth input_fasta_assembled\/\/krona\.proc\.o\.tsv\,Process\ Organisms input_fasta_assembled\/\/krona\.proc\.od\.tsv\,Process\ Organism\ Depth -o input_fasta_assembled//krona.html
sh: 1: ktImportText: not found
[13:12:34] Could not run command: ktImportText input_fasta_assembled\/\/krona\.g\.tsv\,Genes input_fasta_assembled\/\/krona\.gd\.tsv\,Gene\ Depth input_fasta_assembled\/\/krona\.o\.tsv\,Organisms input_fasta_assembled\/\/krona\.od\.tsv\,Organism\ Depth input_fasta_assembled\/\/krona\.mod\.g\.tsv\,Modules\ Genes input_fasta_assembled\/\/krona\.mod\.gd\.tsv\,Modules\ Gene\ Depth input_fasta_assembled\/\/krona\.mod\.o\.tsv\,Modules\ Organisms input_fasta_assembled\/\/krona\.mod\.od\.tsv\,Modules\ Organism\ Depth input_fasta_assembled\/\/krona\.proc\.g\.tsv\,Process\ Genes input_fasta_assembled\/\/krona\.proc\.gd\.tsv\,Process\ Gene\ Depth input_fasta_assembled\/\/krona\.proc\.o\.tsv\,Process\ Organisms input_fasta_assembled\/\/krona\.proc\.od\.tsv\,Process\ Organism\ Depth -o input_fasta_assembled//krona.html
amcomeau commented 2 days ago

PS: As a follow-up to the above, you do not mention that Krona is a dependency...but do you not have it installed in the conda env which sets up Metascan? I am running inside your latest conda.