Closed standage closed 6 months ago
I've drafted some classes for aggregating the data for the first QC section of the report in qcstats.py
and qcsummaries.py
. The following demonstrates how to access the variables needed to populate the report.
>>> from microhapulator.qcsummary import PairedReadQCSummary, SingleEndReadQCSummary
>>>
>>> qc = SingleEndReadQCSummary.collect(["SRM8398-1", "SRM8398-2", "SRM8398-3"], workdir="scratch/WD_clean2/")
>>> for sample, stats in qc.items():
... print(sample, stats.total_reads, stats.filtered_ambig, stats.filtered_length, stats.retention, sep="\t")
...
SRM8398-1 211,819 106,476 (50.3%) 6,092 (2.9%) 99,251 (46.9%)
SRM8398-2 328,618 180,940 (55.1%) 7,471 (2.3%) 140,207 (42.7%)
SRM8398-3 245,967 107,726 (43.8%) 8,225 (3.3%) 130,016 (52.9%)
>>>
>>>
>>> qc = PairedReadQCSummary.collect(["SRM8398-1", "SRM8398-2", "SRM8398-3"], workdir="scratch/WD_nimagen_testC")
>>> for sample, stats in qc.items():
... print(sample, stats.ambig.total_reads, stats.ambig.excluded_r1, stats.ambig.excluded_r2, stats.ambig.excluded_both, stats.ambig.excluded, stats.ambig.retained, stats.ambig.retention_rate, sep="\t")
...
SRM8398-1 134,086 827 477 42,698 44,002 90084 67.2%
SRM8398-2 208,872 1,152 1,767 77,869 80,788 128084 61.3%
SRM8398-3 167,704 883 462 45,617 46,962 120742 72.0%
>>>
>>> for sample, stats in qc.items():
... print(sample, stats.merge.total_reads, stats.merge.merged_reads, stats.merge.merge_rate, sep="\t")
...
SRM8398-1 90,084 89,099 98.9%
SRM8398-2 128,084 126,633 98.9%
SRM8398-3 120,742 119,772 99.2%
>>>
>>> for sample, stats in qc.items():
... print(sample, stats.length.total_reads, stats.length.excluded, stats.length.kept, stats.length.retention_rate, sep="\t")
...
SRM8398-1 89,099 2,452 86,647 97.2%
SRM8398-2 126,633 3,298 123,335 97.4%
SRM8398-3 119,772 3,546 116,226 97.0%
>>>
The code definitely looks a lot cleaner and functional review looks good too. One note is that in the future, if we implement other filters for paired reads, we should generalized the PairedAmbiguityFilterStats
class to just PairedFilterStats
as we will want to track the same stats for any filtering that we do. But i don't think we need to worry about that yet since currently the only filter was have for paired-end data is ambiguity filtering.
The purpose of this branch is to clean up the code responsible for collating and rendering the HTML report for the end-to-end analysis pipeline.