Bioconductor / Rsamtools

Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import
https://bioconductor.org/packages/Rsamtools
Other
27 stars 27 forks source link

Samtools Stats Execution and Parsing #63

Open DarioS opened 2 days ago

DarioS commented 2 days ago

samtools stats produces a variety of statistics tables and puts them all into one into one output file.

# Summary Numbers. Use `grep ^SN | cut -f 2-` to extract this part.
SN      raw total sequences:    2935735484      # excluding supplementary and secondary reads
SN      filtered sequences:     0
SN      sequences:      2935735484
SN      is sorted:      1
SN      1st fragments:  1467867742
SN      last fragments: 1467867742
    ...        ...
# Coverage distribution. Use `grep ^COV | cut -f 2-` to extract this part.
COV     [1-1]   1       2079895
COV     [2-2]   2       1438833
COV     [3-3]   3       1073855
COV     [4-4]   4       960210
COV     [5-5]   5       864269
COV     [6-6]   6       804097
COV     [7-7]   7       755066
COV     [8-8]   8       717224
COV     [9-9]   9       692326
COV     [10-10] 10      669770
    ...        ...

It would be great to run it, parse the output into a DataFrameList, and encode COV as Rle and calculate summary statistics.