PacificBiosciences / HiPhase

Small variant, structural variant, and short tandem repeat phasing tool for PacBio HiFi reads
Other
70 stars 4 forks source link

Failing to overwrite existing phasing tags in the bam file #46

Closed mrvollger closed 1 month ago

mrvollger commented 1 month ago

Hi @holtjma,

From the recent change logs, I was excited to read that Hiphase now supports passing in already-phased input and then reprocessing. But when I tried this, I got an error with 1.4.4:

[2024-09-24T17:06:57.966Z ERROR hiphase] Error while saving haplotags: failed to add aux field, tag is already present

Any ideas? or did I misunderstand the changelog?

Thanks! Mitchell

Full message:


[2024-09-24T17:03:56.526Z INFO  hiphase::cli] Alignment file: "../merged/ST002-lung.bam"
[2024-09-24T17:03:56.526Z INFO  hiphase::cli] Variant file: "results/ST002-lung/deepvariant/ST002-lung.deepvariant.gvcf.gz"
[2024-09-24T17:03:56.527Z INFO  hiphase::cli] Variant file: "results/ST002-lung/pbsv/ST002-lung.pbsv.vcf.gz"
[2024-09-24T17:03:56.551Z INFO  hiphase::cli] Reference file: "/mmfs1/gscratch/stergachislab/assemblies/hg38.analysisSet.fa"
[2024-09-24T17:03:56.551Z INFO  hiphase::cli] Variant filtering:
[2024-09-24T17:03:56.551Z INFO  hiphase::cli]   Minimum call quality: 0
[2024-09-24T17:03:56.551Z INFO  hiphase::cli]   Minimum mapping quality: 5
[2024-09-24T17:03:56.551Z INFO  hiphase::cli]   Minimum matched alleles: 2
[2024-09-24T17:03:56.551Z INFO  hiphase::cli] Phase block generation:
[2024-09-24T17:03:56.551Z INFO  hiphase::cli]   Minimum spanning reads: 1
[2024-09-24T17:03:56.551Z INFO  hiphase::cli]   Supplemental mapping block joins: ENABLED
[2024-09-24T17:03:56.551Z INFO  hiphase::cli]   Phase singleton blocks: DISABLED
[2024-09-24T17:03:56.551Z INFO  hiphase::cli] Allele assignment:
[2024-09-24T17:03:56.551Z INFO  hiphase::cli]   Local re-alignment maximum reference buffer: +-15 bp
[2024-09-24T17:03:56.551Z INFO  hiphase::cli]   Global re-alignment max edit distance: 500
[2024-09-24T17:03:56.551Z INFO  hiphase::cli]   Global prune distance: 500
[2024-09-24T17:03:56.551Z INFO  hiphase::cli] Processing threads: 16
[2024-09-24T17:03:56.551Z INFO  hiphase::cli] I/O threads: 4
[2024-09-24T17:03:59.138Z INFO  hiphase::data_types::reference_genome] Loading "/mmfs1/gscratch/stergachislab/assemblies/hg38.analysisSet.fa"...
[2024-09-24T17:04:02.786Z INFO  hiphase::data_types::reference_genome] Finished loading 195 contigs.
[2024-09-24T17:04:02.786Z INFO  hiphase] Phase block generation starting...
[2024-09-24T17:04:02.786Z INFO  hiphase] Starting job pool with 16 threads...
[2024-09-24T17:04:26.394Z INFO  hiphase] Generated 100 phase blocks, latest block: PhaseBlock { block_index: 99, coordinates: "chr1:56165769-56334828", num_variants: 305, sample_name: "ST002-lung" }
[2024-09-24T17:04:27.158Z INFO  hiphase::read_parsing] B#12 Detected broad global realignment failure, reverting to local for the rest of the block.
[2024-09-24T17:04:49.304Z INFO  hiphase] Generated 200 phase blocks, latest block: PhaseBlock { block_index: 199, coordinates: "chr1:101673309-102286089", num_variants: 738, sample_name: "ST002-lung" }
[2024-09-24T17:05:11.826Z INFO  hiphase] Generated 300 phase blocks, latest block: PhaseBlock { block_index: 299, coordinates: "chr1:123622674-123638401", num_variants: 75, sample_name: "ST002-lung" }
[2024-09-24T17:05:27.921Z INFO  hiphase] Generated 400 phase blocks, latest block: PhaseBlock { block_index: 399, coordinates: "chr1:156838373-157635745", num_variants: 934, sample_name: "ST002-lung" }
[2024-09-24T17:05:47.659Z INFO  hiphase] Generated 500 phase blocks, latest block: PhaseBlock { block_index: 499, coordinates: "chr1:204700739-204720717", num_variants: 6, sample_name: "ST002-lung" }
[2024-09-24T17:05:59.397Z INFO  hiphase::read_parsing] B#235 Detected broad global realignment failure, reverting to local for the rest of the block.
[2024-09-24T17:06:07.816Z INFO  hiphase::read_parsing] B#347 Detected broad global realignment failure, reverting to local for the rest of the block.
[2024-09-24T17:06:09.686Z INFO  hiphase::read_parsing] B#348 Detected broad global realignment failure, reverting to local for the rest of the block.
[2024-09-24T17:06:15.274Z INFO  hiphase] Received results for 100 phase blocks: 0.7548 blocks/sec, 57.7785 hets/sec, writer waiting on block 0
[2024-09-24T17:06:16.151Z INFO  hiphase] Generated 600 phase blocks, latest block: PhaseBlock { block_index: 599, coordinates: "chr2:17033636-17033636", num_variants: 1, sample_name: "ST002-lung" }
[2024-09-24T17:06:41.860Z INFO  hiphase] Generated 700 phase blocks, latest block: PhaseBlock { block_index: 699, coordinates: "chr2:70440214-70440214", num_variants: 1, sample_name: "ST002-lung" }
[2024-09-24T17:06:57.966Z ERROR hiphase] Error while saving haplotags: failed to add aux field, tag is already present```
holtjma commented 1 month ago

No, you read it correctly, it should strip and add the appropriate tags. I wonder if I missed one somewhere that didn't pop up in our testing. Do you have an example bam-let and vcf-let you can share that reproduces the issue?

Matt

mrvollger commented 1 month ago

Thanks for the quick reply!

Hmm, this data is private. Is there any way I can get hiphase to tell me the region in which this failure happens, I might be able to share a small enough subset safely.

mrvollger commented 1 month ago

i.e. can I assume the error is happening here:

chr2:70440214-70440214
holtjma commented 1 month ago

I took a look at the code, and I think I know what might be causing this after some local tinkering. Can you give this binary a test and let me know if that resolves it on your end?

hiphase-v1.4.5-x86_64-unknown-linux-gnu.tar.gz

mrvollger commented 1 month ago

Awesome, getting on it right now.

mrvollger commented 1 month ago

Well past the previous error (now on chr6), so I think it worked, but will confirm when its all done.

mrvollger commented 1 month ago

All finsihed without error! Looking forward to the patch!

holtjma commented 1 month ago

Yea, feel free to use that binary in the interim if you need it. I am running some additional checks since this was a decent sized oversight (we were focused on the VCF part, not so much the BAM part at the time). I suspect we'll have it out sometime later this week though.

holtjma commented 1 month ago

Official version after testing is now available here: https://github.com/PacificBiosciences/HiPhase/releases/tag/v1.4.5

If you use bioconda, that will propagate at some point later today