Questions:
Is there any way to index which reads contribute to pileup count data?
What options should we use to run pileup on Nanopore sequencing data (particularly for quality score option)?
Are there any other filters that pileup implements by default (other than minimum base quality [13], minimum mapping quality [0], and max depth [8000])?
Description:
We have previously published data using Rsamtools pileup to generate read counts overlapping mutation sites.
We are now using a tool called sam2tsv to index which read IDs belong to which counts. However when I count up the number of reads to the mutation in the sam2tsv data, the counts are not matching up with pileup counts despite filtering sam2tsv data based on pileup default filters and additional filters I applied in the command.
The quality scores in Nanopore do not fully match the ASCII_BASE 33 and 64 scores.
Quality score table: https://www.drive5.com/usearch/manual/quality_score.html
Our data quality scores partially follow ASCII 33 but you can see in the table that the score stops at "K = 42" being the highest score. However, our data continues all the way down to Z, with 57 being the highest score.
What options should we use in pileup to take into account Nanopore data?
Commands:
pileup( bf, a2g.snp,
scanBamParam=ScanBamParam( flag = scanBamFlag( isDuplicate=F ),
which = a2g.snp ),
pileupParam=PileupParam( distinguish_strands=F) )
pileup( bf, a2g.snp,
scanBamParam=ScanBamParam( flag = scanBamFlag( isDuplicate=F ),
which = a2g.snp ),
pileupParam=PileupParam( distinguish_strands=F),
phred2ASCIIOffset(scheme= c("Sanger")))
The second command was based on reading in a forum that the default is ASCII 64, so I tried to change it to 33 by writing phred2ASCIIOffset(scheme= c("Sanger")) even though Nanopore only partially matches ASCII 33. Both commands seem to give the same output.
My expertise is not in bioinformatics, so your feedback for how to use pileup on Nanopore data would be greatly appreciated.
Questions: Is there any way to index which reads contribute to pileup count data? What options should we use to run pileup on Nanopore sequencing data (particularly for quality score option)? Are there any other filters that pileup implements by default (other than minimum base quality [13], minimum mapping quality [0], and max depth [8000])?
Description: We have previously published data using Rsamtools pileup to generate read counts overlapping mutation sites. We are now using a tool called sam2tsv to index which read IDs belong to which counts. However when I count up the number of reads to the mutation in the sam2tsv data, the counts are not matching up with pileup counts despite filtering sam2tsv data based on pileup default filters and additional filters I applied in the command.
The quality scores in Nanopore do not fully match the ASCII_BASE 33 and 64 scores. Quality score table: https://www.drive5.com/usearch/manual/quality_score.html Our data quality scores partially follow ASCII 33 but you can see in the table that the score stops at "K = 42" being the highest score. However, our data continues all the way down to Z, with 57 being the highest score.
What options should we use in pileup to take into account Nanopore data?
Commands:
The second command was based on reading in a forum that the default is ASCII 64, so I tried to change it to 33 by writing phred2ASCIIOffset(scheme= c("Sanger")) even though Nanopore only partially matches ASCII 33. Both commands seem to give the same output.
My expertise is not in bioinformatics, so your feedback for how to use pileup on Nanopore data would be greatly appreciated.