NBISweden / aMeta

Ancient microbiome snakemake workflow
MIT License
19 stars 14 forks source link

The authentication plots are generated with different malt outputs and the amount of reads therefore don't correspond. #126

Open ZoePochon opened 1 year ago

ZoePochon commented 1 year ago

In the authentication plot pdf file, some plots are generated from the rma6 files and some from the sam file. This is not ideal because the sam file contains more reads when it comes to the Histogram of PMD scores plot and the Read length distribution plot. I understand that it could make sense for the Breadth of coverage plot though because you don't want to have only unique region specific to the species covered but also conserved regions covered accross species I suppose.

Maybe it doesn't need to be changed, but there should be some way to see on the plot which ones are made from which input, like a color scheme or blocs or something. And an explanation of the difference between the inputs and why it make sense that they have more or less reads. Just an idea on how to make people understand it.

ZoePochon commented 1 year ago

Also the read length used for the authentication scores is based on the sam file, whereas the plot is based on the rma6 file. So even if most of the reads (90%) in the sam file are less than 100bp, it doesn't appear to be so on the plot. Which is not easy to explain when presenting the results and the calculation of the authentication scores visually.