Open smacarthur opened 10 years ago
Hi Stewart,
Strand specific counts are already in the output. They are documented as:
num_plus_strand → number of reads on the plus/forward strand
num_minus_strand → number of reads on the minus/reverse strand
Breaking out the individual summary metrics by strand as well is not something I had considered before, but I can see the utility of providing that information. I will need to think about how best to report it within the existing file format or if this would require more extensive changes in the output format.
I saw the counts per strand, which are useful. I think having the other things like base quality per strand would also be really useful, though I understand the problems with the output. The easiest and ugliest way would be be have comma delimited values within colon separated values. 31,39:31.9,32.4: etc or you could use the same notation you have for the libraries with the curly braces, though I think that is actually far more difficult to parse. On a similar note do you have tools to parse the output? I have been parsing the output into an R data structure based on GenomicRanges which works well for me.
+1 any more thoughts on this? Also a summary option would be useful to collapse the =ACGTN stats into an average value. Thank you.
It would be really useful if the statistics from read-counts were split by strand, for example the count of As on Fwd and Rev strands, and the mean base quality on each strand. This would be really useful for enrichment data, where there may be a stand bias. Let me know if you want some more use cases. Thanks,
Stewart