PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
255 stars 45 forks source link

Request for improved pbmm2 error message around zipped references #709

Open mrvollger opened 3 months ago

mrvollger commented 3 months ago

Operating system redhat

Package name

pbmm2 1.13.0
Using:
  pbmm2    : 1.13.0 (commit v1.13.0-2-gbcd99f5)
  pbbam    : 2.4.99 (commit v2.4.0-23-g59248fe)
  pbcopper : 2.3.99 (commit v2.3.0-28-ga9b1ffa)
  boost    : 1.81
  htslib   : 1.17
  minimap2 : 2.26
  zlib     : 1.2.13

Describe the bug When providing a zipped reference pbmm2 complains about the format of the input reads instead of the reference.

Error message

pbmm2 align ERROR: Could not determine read input type(s). Please do not mix data types, such as BAM+FASTQ. File of files may only contain BAMs or datasets.

To Reproduce

pbmm2 align ref.fa.gz ../data/hap-alns/GM12878.PacBio.H1.GRCh38.bam example.bam
>|> 20240820 22:31:55.970 -|- WARN -|- operator() -|- 0x7f11a49d9f80|| -|- Input is aligned reads. Only primary alignments will be respected to allow idempotence!
>|> 20240820 22:31:55.970 -|- FATAL -|- CheckPositionalArgs -|- 0x7f11a49d9f80|| -|- pbmm2 align ERROR: Could not determine read input type(s). Please do not mix data types, such as BAM+FASTQ. File of files may only contain BAMs or datasets.

Expected behavior I know that gzipped references are not supported for pbmm2, but it took me quite a while to discover this when I was looking for issues with the input reads rather than the reference. Alternatively, support for zipped references would be great!

As a side note, it would be nice if pbmm2 allowed the .fna extension for references, which is sometimes the extension you get when downloaded from NCBI, e.g.:

GCA_000001405.15_GRCh38_no_alt_analysis_set.fna

Thanks, Mitchell

armintoepfer commented 2 months ago

We might consider adding, but at this point, you are the first one to ask in like >5 years.

ASLeonard commented 2 months ago

It seems like an uncommon use case, but especially since minimap2 can take gzipped references, this does (negligibly) complicate porting over to pbmm2.

mrvollger commented 2 months ago

I would love gzip compatibility, but I wanted to clarify that my main issue is the error message when you use a gzip reference:

pbmm2 align ERROR: Could not determine read input type(s). Please do not mix data types, such as BAM+FASTQ. File of files may only contain BAMs or datasets.

This error incorrectly indicates that the format of the reads rather than the reference is incorrect.

armintoepfer commented 2 months ago

That error message should be fixed in the latest version that we'll release soon

mrvollger commented 2 months ago

Awesome, thanks @armintoepfer

armintoepfer commented 2 weeks ago

Can you give it a try again?

mrvollger commented 2 weeks ago

@armintoepfer should I be trying 1.16? Because I am still getting the same error msg:

 pbmm2 --version && pbmm2 align tmp.fa.gz ~/tmp.bam tmp.out.bam
pbmm2 1.16.0

Using:
  pbmm2    : 1.16.0 (commit v1.16.0)
  pbbam    : 2.7.0 (commit v2.7.0)
  pbcopper : 2.6.0 (commit v2.6.0)
  boost    : 1.81
  htslib   : 1.17
  minimap2 : 2.26
  zlib     : 1.2.13
>|> 20241114 19:01:20.294 -|- WARN -|- operator() -|- 0x7f7266e0ff80|| -|- Input is aligned reads. Only primary alignments will be respected to allow idempotence!
>|> 20241114 19:01:20.295 -|- FATAL -|- CheckPositionalArgs -|- 0x7f7266e0ff80|| -|- pbmm2 align ERROR: Could not open or determine read input type(s). Please do not mix data types, such as BAM+FASTQ. File of files may only contain BAMs or datasets.
armintoepfer commented 2 weeks ago

Yes. I thought our fix would have fixed it as a side effect. Will file an actual issue. Stay tuned, ty