PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
243 stars 44 forks source link

lima not outputting enough ZMWs - issue with primers or primer choice? #695

Closed carla-hazelf closed 1 month ago

carla-hazelf commented 1 month ago

Operating system PRETTY_NAME: "Rocky Linux 8.8 (Green Obsidian)" VERSION: "8.8 (Green Obsidian)"

Package name lima lima 1.11.0 (commit v1.11.0-1-gec618c9)

Conda environment What is the result of conda list? (Try to paste that between triple backticks.)

Describe the bug lima is removing too many ZMW's. I am using the standard primers;

NEB_5p GCAATGAAGTCGCAGGGTTGGG Clontech_5p AAGCAGTGGTATCAACGCAGAGTACATGGGG NEB_Clontech_3p GTACTCTGCGTTGATACCACTGCTT

With these primers, I get; ZMWs above all thresholds (B) : 452638 (23%) I'm unsure of whether this issue is due to; -> primer choice -> or an issue with the raw sequence data itself.

I looked at the represented sequences in fastqc; image

Does this mean that the primers have not worked?

Error message `ZMWs input (A) : 1939872 ZMWs above all thresholds (B) : 452638 (23%) ZMWs below any threshold (C) : 1487234 (77%)

ZMW marginals for (C): Below min length : 24 (0%) Below min score : 0 (0%) Below min end score : 390827 (26%) Below min passes : 333 (0%) Below min score lead : 0 (0%) Below min ref span : 750683 (50%) Without SMRTbell adapter : 333 (0%) Undesired hybrids : 333 (0%) Undesired 5p--5p pairs : 713200 (48%) Undesired 3p--3p pairs : 667027 (45%) Undesired no hit : 333 (0%)

ZMWs for (B): With different pair : 452638 (100%) Coefficient of correlation : 0%

ZMWs for (A): Allow diff pair : 1939539 (100%) Allow same pair : 1939539 (100%)

Reads for (B): Above length : 452638 (100%) Below length : 0 (0%) `

To Reproduce lima tissue.ccs.bam primers.fa $dir/$name.$tissue1.fl.bam --isoseq --num-threads $threads

Expected behavior I expected to have more ZMWs above the threshold.

armintoepfer commented 1 month ago

Please check your data manually if you find those primer combinations. It's HiFi data, so a grep works most of the times