BioLockJ-Dev-Team / sheepdog_testing_suite

Test suite for BioLockJ development team.
3 stars 8 forks source link

metaphlan2 parser #322

Open ssun6 opened 3 years ago

ssun6 commented 3 years ago

The metaphlan2 parser seems to have an issue of double counting. For example, below is the abundance of Bacteroidetes in raw outputs. The row kBacteria|pBacteroidetes should be the abundance of phylum Bacteroidetes, but the parser seems to use the sum of all taxonomic levels from phylum to species. Because the unclassified taxa are not in the outputs, the resulting abundance is not exactly 7 times of the phylum abundance. But the composition of the results are possibly similar that it didn't change the downstream analysis much to be noticed. I think it is better to switch to relative abundance for metaphlan2 outputs or make relative abundance and estimated reads per clade two options.

kBacteria|pBacteroidetes | 79267 | 1942578 kBacteria|pBacteroidetes|cBacteroidia | 71024 | 1740574 kBacteria|pBacteroidetes|cBacteroidia|oBacteroidales | 71024 | 1740574 kBacteria|pBacteroidetes|cBacteroidia|oBacteroidales|fBacteroidaceae | 23664 | 2318722 kBacteria|pBacteroidetes|cBacteroidia|oBacteroidales|fBacteroidaceae|gBacteroides | 23664 | 2318722 kBacteria|pBacteroidetes|cBacteroidia|oBacteroidales|fBacteroidales_noname | 0 | 0 kBacteria|pBacteroidetes|cBacteroidia|oBacteroidales|f__Bacteroidales_noname|gBacteroidales_noname | 0 | 0 kBacteria|pBacteroidetes|cBacteroidia|oBacteroidales|fPorphyromonadaceae | 8786 | 48644 kBacteria|pBacteroidetes|cBacteroidia|oBacteroidales|fPorphyromonadaceae|g__Barnesiella | 0 | 2424