Illumina / ExpansionHunter

A tool for estimating repeat sizes
Other
182 stars 51 forks source link

Repeat on Forward and reverse strand #103

Closed kumara3 closed 4 years ago

kumara3 commented 4 years ago

Hello,

Thank you for Expansion Hunter and its expansion with newer version. Indeed a great tool!

I have few question in terms of how EH handles repeats on forward and reverse strand a) For CAG repeat (defined as region 5586155-5586227, hg38 per variant catalog), if I run expansion hunter with locus ID as CTG (and same region as above), I expect to get similar results. My logic is CTG exist in the reverse strand. b) I did try that. For some of my samples I get similar results, but then for few samples results change drastically.

So my question is how EH handles such scenario. If running those scenarios even feasible in your opinion? Please correct me if I am running the program wrong.

Thank you! Regards, Ashwani

egor-dolzhenko commented 4 years ago

Hello Ashwani,

Thank you for the question! ExpansionHunter assumes that repeat motifs are given in the forward / reference orientation. Since hg38 coordinates chr18:55586155-55586227 correspond to a CAG repeat (in the reference orientation), this repeat should be defined as such in the variant catalog.

Does this answer your question? Please let me know if there is anything I can clarify or if you have any other questions.

Best wishes, Egor

kumara3 commented 4 years ago

Hello,

Thank you for your reply! Though I am still wondering about my statement in b (above) and why I am getting nearly similar results, except for few samples where results change drastically (repeat units goes down with CAG as compared to CTG)

Few follow up questions- 1) What do you think could be the reason for such a behavior ?

2) As per logic you stated, If I give CTG as my variantid, then ExpansionHunter think it as being present in forward orientation, extracts all the reads and mates around reference region and then tries to find a match for (CTG)n in the reads. So even giving CTG as the variantid should not be incorrect ? Please correct me if I am thinking wrong here.

Regards, Ashwani

egor-dolzhenko commented 4 years ago

Thanks for the questions!

Please let me know if I answered your questions. Also, you could get additional insight into this by visualizing read alignments generated by ExpansionHunter with GraphAlignmentViewer: https://github.com/Illumina/GraphAlignmentViewer/

kumara3 commented 4 years ago

Thank you very much!

egor-dolzhenko commented 4 years ago

Glad to help!