PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
243 stars 44 forks source link

pbmerge - overlapping movie/zmw combination #700

Open gevro opened 2 weeks ago

gevro commented 2 weeks ago

Using the latest versions of Lima and pbmerge, I'm trying to re-merge samples that were demultiplexed by Lima. But pbmerge is throwing an error that there are overlapping movies/ZMW combinations. I manually checked and there are no overlapping movie/ZMW combinations and of course there shouldn't be because Lima demultiplexed the samples. I think this might be a bug in pbmerge.

Maybe pbmerge has an undocumented requirement that ZMWs are sorted in order?

gevro commented 2 weeks ago

And here is the exact error:

| 20240709 19:31:22.129 | FATAL | pbmerge ERROR: [pbbam] comparison ERROR: cannot sort CCS/transcripts that share both movie name & ZMW hole number

And tool versions:

lima 2.9.0

Using:
  lima      : 2.9.0 (commit v2.9.0)
  pbbam     : 2.5.0 (commit v2.5.0)
  pbcopper  : 2.4.0 (commit v2.4.0)
  boost     : 1.81
  htslib    : 1.17
  zlib      : 1.2.13

pbmerge 3.1.1 (commit v3.1.1)

Using:
  pbbam     : 2.5.0 (commit v2.5.0)
  pbcopper  : 2.4.0 (commit v2.4.0)
  boost     : 1.81
  htslib    : 1.17
  zlib      : 1.2.13
gevro commented 2 weeks ago

I found the issue, which revealed a bug. I'm running ccs in stranded mode. Somehow for one ZMW in the run, the fwd ccs strand was assigned by lima to one sample and the rev ccs strand was assigned to a different sample.

Lima should know for stranded runs that both strands of the ZMW should be assigned to the same sample. So if there are two strands for a ZMW, and there is a conflict between the sample assignments of the two strands, the ZMW should be discarded entirely.