PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
252 stars 44 forks source link

[Not a bug report] Fixing header might resolve errors like #345 #606

Closed SHuang-Broad closed 1 year ago

SHuang-Broad commented 1 year ago

Operating system This applies to all OS where pbsv can run.

Package name pbsv 2.4.0+.

Conda environment Not relevant

Describe the bug When you run pbsv discover on an ONT bam, you may see a mysterious bug like reported in #345.

Error message

>|> 20230828 01:52:02.907 -|- FATAL -|- Run -|- 0x7f704219e4c0|| -|- pbsv discover ERROR: map::at

To Reproduce Run pbsv discover on an ONT bam that has a readgroup line in the BAM header section with fields other than ID, PU, SM. You'll likely see errors mentioned above.

Expected behavior pbsv discover should finish successfully.

Possible solution Reheader your ONT bam with modified readgroup lines where only ID, PU, SM fields are kept. This is done assuming no information from other fields defined in readgroup lines are used critically in pbsv discover. Arguably, the utility code in pbcopper could be made to make the error reporting a bit clearer, so that we know if this header issue indeed is causing problems.

I know this is pbsv not some other *sv, so just posting here in case other users see similar errors.

Maintainers feel free to close this.

BTW, here's the original readgroup line of my ONT bam (library, flowcell ID, and sample name are manipulated).

@RG ID:abc7612e-b950-4fa2-94b1-47fe2a5ecf71_dna_r10.4.1_e8.2_sup@v3.5.1 DT:2023-05-04T15:41:38+0000 DS:runid=abc7612e-b950-4fa2-94b1-47fe2a5ecf71;basecall_model=dna_r10.4.1_e8.2_sup@v3.5.1    LB:this_is_secret   PL:ONT  PM:1H   PU:PAO12345 al:unclassified SM:you_dont_know_me
armintoepfer commented 1 year ago

PacBio tools are not compatible with non-PacBio BAM files. We do not make any guarantees about performance or validity if incompatible files are used as input to our tools.