PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
247 stars 43 forks source link

skera: reads are both passing and non passing #608

Closed phpeters closed 11 months ago

phpeters commented 11 months ago

Dear PacBio people,

Thanks for the nice skera tool! I stumbled over some non-fitting numbers and wanted to ask if it's just me, or some small bug. Thanks and best! Philipp

Operating system Centos 7.9

Package name skera split 1.1.0

Conda environment

pbskera                   1.1.0                hdfd78af_0    bioconda

Describe the bug The numbers of reads in my skera-produced bam-file and in the non_passing.bam file didn't add up properly. I would expect that (number of reads in the result-bam-fle) + (number of reads in non_passing.bam) == number of total input Hifi reads (rq>=99) My run: Number of input HiFi reads: 2755095 Number of passing reads in bam: 302127 Number of non-passing reads in non_passing bam: 2627356 2755095 != 302127 + 2627356 But I've found 174388 read IDs present in both bam files (which are both passing and non passing). Substracting that number from the sum provides the correct number of HiFi reads from the input.

To Reproduce I used the adapter from the Khafaji paper ("High-throughput RNA isoform sequencing using programmable cDNA concatenation"), but otherwise the simple command skera split input_HiFi.bam mas16_adapters.fasta output.bam

Expected behavior Shouldn't reads be either passing or non passing?

Little other bug I think the skera_summary.csv states the number of found fragments, but writes "Segmented Reads" (at least by the definition on skera.how)

jmattick commented 11 months ago

Hi @phpeters, The non_passing bam contains the HiFi record for any input that did not have a 100% found adapter array to enable us to run skera undo if needed. For example, if skera finds 14 / 15 expected segments(fragments), the 14 expected segments will be output to the segmented read output and the HiFi input read will be output to the non_passing bam. Best, Jessica