bcgsc / straglr

Tandem repeat expansion detection or genotyping from long-read alignments
Other
50 stars 9 forks source link

An issue about complex STRs #19

Closed fjmuzengyiheng closed 8 months ago

fjmuzengyiheng commented 1 year ago

Hi, @readmanchiu , Sorry for bother again. I am using this first-tier tool for STR counting for my neurogenetic patients.

Here is one issue I want to report:

There is one disease named "CANVAS (Cerebellar ataxia, neuropathy, and vestibular areflexia syndrome)", which is caused by an expansion of (AAGGG)n repeat in RFC1 gene. (https://omim.org/entry/102579?search=RFC1&highlight=rfc1)

The sticky situation lies in:

  1. The reference sequence is (AAAAG)n for this loci (hg38, 4:39,348,424-39,348,483).
  2. There is an (AAGGG)n expansion in my data (ONT), which is confirmed by visualizing manually by IGV. (attached below)
  3. When I use the bed file below, Straglr outputs nothing.

【bed file】 4 39348424 39348483 AAAAG"

I am not willing to give up this tool for its outstanding performance. Can Straglr deal with this "complex STRs (changed motif situation)"? I will be pleased if Straglr can deal with this situation, which will make it SUPER perfect. Thank you!

[attached file1: IGV visualization for my patient's ONT data]

image

[attached file2: 3510bp insertion in the first line]

AGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAAGGAAGGAAGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGCGGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGAAGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGAAGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGGAAGGGAAGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGCAATACAGAAGAAGAAGTAATACAGAAGGAAGGAAGGAAGGGAAGGGAAGGAAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGAAGGGAAGGGAAGGCAAGGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAAGGAAGGAAGGGAAGGGAAGGGAAGGGAGGAAGGGAGGGAAGGGAAGGAAGGGAAGGGAAGGAAGGGAAGGGCGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGGAAGGAAGGGAAGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGAAGGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGGAAGGAAGGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAGGGAAGGAAGGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGAAGGAAGGAAGGGAAGGAAGGGAAGGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGAAGGAAGGAAGGAAGGAAGGGAAGGGAAGGAAGGGAAGGAAGGAAGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGGAAGGG

readmanchiu commented 1 year ago

Hi @fjmuzengyiheng,

Thanks for the wait, a new version has been made. Please give it a try to see if it has any problem genotype this event. Please let me know.

fjmuzengyiheng commented 1 year ago

Hi @fjmuzengyiheng,

Thanks for the wait, a new version has been made. Please give it a try to see if it has any problem genotype this event. Please let me know.

Thank you. It is nice of you. I will try this version as soon as possible. Thank you again!

fjmuzengyiheng commented 1 year ago

Hi, @readmanchiu I've tested the new version (v1.4.0) of Straglr for genotyping this locus.

when I provided bed file: 4 39348424 39348483 AAAAG straglr outputs:

image

when I provided bed file: 4 39348424 39348483 AAGGG straglr outputs:

image

It is still not so perfect to genotype this locus. Do you mind if i provide the bam file of my patient to test this locus? Thank you so much.

readmanchiu commented 1 year ago

Have your tried "AAGGG"? seems like this instead of AAAAG is the predominant motif in your sequence. I've worked with a heterozygous case where at least the normal allele has the reference allele (the expansion has the different one). It's tricky for homozygous cases where only the non-reference allele exists. But please send me the bam file via e-mail (or tell me how I can access it in the email), I'm more than happy to tackle it.

readmanchiu commented 8 months ago

Issue followed up through private communication