PacificBiosciences / trgt

Tandem repeat genotyping and visualization from PacBio HiFi data
Other
103 stars 7 forks source link

trgt merge - skip problematic loci #39

Closed hdashnow closed 1 month ago

hdashnow commented 1 month ago

I'm merging some older TRGT vcfs using the new merge functionality. I'm guessing some of the locus positions don't match up between samples because of deletions. Is there a way to skip over these so the merge doesn't fail?

Using merge function from trgt 1.1.1-62f1f0e

Input vcfs are:

trgtVersion=0.7.0-493ef25

thread 'main' panicked at src/merge/strategy/exact.rs:14:17:
assertion `left == right` failed: Reference alleles do not match
  left: [99, 65, 65, 65, 65, 65, 65, 84, 65, 65, 65, 65, 65, 65, 71, 65, 65, 65, 71, 71, 71, 65, 65, 71, 71, 71, 71, 65, 71, 71, 71, 71, 65, 65, 71, 71, 71, 65, 71, 71, 71, 71, 71, 65, 71, 71, 71, 71, 71]
 right: [67, 65, 65, 65, 65, 65, 65, 84, 65, 65, 65, 65, 65, 65, 71, 65, 65, 65, 71, 71, 71, 65, 65, 71, 71, 71, 71, 65, 71, 71, 71, 71, 65, 65, 71, 71, 71, 65, 71, 71, 71, 71, 71, 65, 71, 71, 71, 71, 71]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
tmokveld commented 1 month ago

Currently, there isn’t an option to skip such cases, but I can look into adding a flag that would allow skipping and logging a warning at any problematic site.

In the meantime, could you share the VCFs where this panic occurs? I’d like to investigate further. If the VCFs were generated using the same repeat definition catalog, this issue typically shouldn’t arise. However, it seems that in this case, only the leading base pair is mismatched.

hdashnow commented 1 month ago

Thanks for looking into it!

It's the most recent platinum pedigree VCFs on chm13.

tmokveld commented 1 month ago

Thank you, this was a bug caused by not normalizing the case of the padding base for pre v1.0 VCFs. The next minor version release will have this fixed.

hdashnow commented 1 month ago

Thanks for fixing this, Tom!