To Reproduce
Run pbsv discover on an ONT bam that has a readgroup line in the BAM header section with fields other than ID, PU, SM.
You'll likely see errors mentioned above.
Expected behaviorpbsv discover should finish successfully.
Possible solution
Reheader your ONT bam with modified readgroup lines where only ID, PU, SM fields are kept.
This is done assuming no information from other fields defined in readgroup lines are used critically in pbsv discover.
Arguably, the utility code in pbcopper could be made to make the error reporting a bit clearer, so that we know if this header issue indeed is causing problems.
I know this is pbsv not some other *sv, so just posting here in case other users see similar errors.
Maintainers feel free to close this.
BTW, here's the original readgroup line of my ONT bam (library, flowcell ID, and sample name are manipulated).
PacBio tools are not compatible with non-PacBio BAM files. We do not make any guarantees about performance or validity if incompatible files are used as input to our tools.
Operating system This applies to all OS where pbsv can run.
Package name
pbsv
2.4.0+.Conda environment Not relevant
Describe the bug When you run
pbsv discover
on an ONT bam, you may see a mysterious bug like reported in #345.Error message
To Reproduce Run
pbsv discover
on an ONT bam that has a readgroup line in the BAM header section with fields other thanID
,PU
,SM
. You'll likely see errors mentioned above.Expected behavior
pbsv discover
should finish successfully.Possible solution Reheader your ONT bam with modified readgroup lines where only
ID
,PU
,SM
fields are kept. This is done assuming no information from other fields defined in readgroup lines are used critically inpbsv discover
. Arguably, the utility code in pbcopper could be made to make the error reporting a bit clearer, so that we know if this header issue indeed is causing problems.I know this is
pbsv
not some other*sv
, so just posting here in case other users see similar errors.Maintainers feel free to close this.
BTW, here's the original readgroup line of my ONT bam (library, flowcell ID, and sample name are manipulated).