PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
253 stars 45 forks source link

LIMA --min-scoring-regions 1 in isoseq mode #713

Closed OliviaDIVALENTIN closed 2 months ago

OliviaDIVALENTIN commented 2 months ago

Hi, I am analysing IsoSeq data with 9 samples multiplexed (Sequel II). I realized the 3p adapters are often truncated which ends up in a lot of reads not passing the min-ref-span threshold when running lima.

ZMWs input                (A) : 3373047
ZMWs above all thresholds (B) : 1450091 (42.99%)
ZMWs below any threshold  (C) : 1922956 (57.01%)

ZMW marginals for (C):
Below min length              : 5549 (0.29%)
Below min score               : 0 (0.00%)
Below min end score           : 1372494 (71.37%)
Below min passes              : 507 (0.03%)
Below min score lead          : 0 (0.00%)
Below min ref span            : 1713005 (89.08%)
Without SMRTbell adapter      : 507 (0.03%)
Undesired hybrids             : 1099401 (57.17%)
Undesired 5p--5p pairs        : 360080 (18.73%)
Undesired 3p--3p pairs        : 453984 (23.61%)
Undesired no hit              : 507 (0.03%)

ZMWs for (B):
With different pair           : 1450091 (100.00%)
Coefficient of correlation    : 82.33%

ZMWs for (A):
Allow diff pair               : 3372540 (99.98%)
Allow same pair               : 3372540 (99.98%)

Reads for (B):
Above length                  : 1514142 (100.00%)
Below length                  : 0 (0.00%)

I tried to change the --min-scoring-regions and --min-ref-span parameters values but it seems to have had no impact. My command line is the following : lima $ccsBamFile $primerFile $flBamFile.fl.bam --isoseq --peek-guess --dump-removed -j 4 --store-unbarcoded --log-level TRACE --min-scoring-regions 1 --min-ref-span 0.5

I am wondering if maybe these options are not compatible with the isoseq mode, or if the isoseq mode is overwritting these parameters. Also, I was not able to find what is really doing the isoseq mode and what values it assigns to the parameters min-scoring-regions and min-ref-span. How could I manage to keep the reads with only one adapter+barcode region passing the min-ref-span threshold ? Thanks

armintoepfer commented 2 months ago

It’s unlikely that you can override Lima’s parameters when using Iso-Seq mode, especially if only one side is available. This scenario wasn’t considered in Lima’s design, so we can't offer support for it. The Iso-Seq mode was specifically developed to be highly sensitive in detecting closely related 3’ and 5’ adapters. Given your situation, I don’t believe I can assist you from a technical standpoint.