Inaccurate copy number - Githubissues

Illumina / ExpansionHunterDenovo

A suite of tools for detecting expansions of short tandem repeats

Other

78 stars 25 forks source link

head example_dataset.casecontrol_locus.tsv contig start end motif pvalue bonf_pvalue counts StrA 1723 2123 AGC 0.03854993587177091 0.07709987174354183 sample1:8.461518151815183,sample2:33.28581629341123,sample3:30.048237956676367,sample4:5.298941798941799,sample5:9.576095617529882 StrB 1684 2154 CCG 0.07864960352514261 0.15729920705028522 sample1:22.211485148514853,sample2:22.88399870172022,sample3:31.08438409311348,sample4:29.674074074074078,sample5:20.21620185922975,sample6:21.3226879574185,sample7:20.358141089936474

Hello Andrew,

Thank you for the question. EHdn performs a relatively simple analysis by identifying and counting reads that are fully contained inside the repeat sequence. Although there is a linear relationship between these counts and the repeat sizes, EHdn does not estimate repeat sizes because it does not know if the expansion occurred on one or both alleles and because it does not know the exact (single-nucleotide resolution) reference coordinates of the repeat.

To summarize, EHdn just determines approximate coordinates of long repeats and produces a rough measure of their length. This information can be used to detect the presence of an expansion but not the exact repeat genotype. In the highlighted example, EHdn found ~30 reads consisting of the AGC motif on StrA.

We are starting to work on the next version of EHdn that will generate much more intuitive results (including the exact repeat sizes).

Relatedly, we recently released a genome-wide repeat catalog compatible with the latest version of (regular) ExpansionHunter. We believe that this catalog contains a good set of targets for studies aimed at discovering new pathogenic / functionally important repeats (though we are still working on a document explaining why we think so).

Please let me know if anything about my answer is unclear or if you have any other questions. If you'd like to discuss the specific dataset you are working with, please feel free to reach out directly by email.

Best wishes, Egor

Illumina / ExpansionHunterDenovo

Inaccurate copy number #45