Illumina / ExpansionHunter

A tool for estimating repeat sizes
Other
177 stars 51 forks source link

Try to use it in kinship analysis #129

Open katerinaoleynikova opened 3 years ago

katerinaoleynikova commented 3 years ago

Good day, Egor,

I have a question related to the accuracy of EH results. I basically use 20 main CODIS loci (consisted of STRs) for kinship analysis that are utilized for paternity/maternity testing, but I always get ~7 mismatches (7 from 20 loci between mom and kid or dad and kid are wrong) but I definitely pre-know that they are from the one family.

That would be great to know any percentage of error of the results or smth else. Thank you!

K

egor-dolzhenko commented 3 years ago

Hi Katerina,

Thanks for the question. Would you be able to share your catalog of CODIS loci? We could run them through our benchmarking pipeline to assess the error rate. If you'd like, you are welcome to share the loci by email (edolzhenko@illumina.com)

Best wishes, Egor

serge2016 commented 3 years ago

Dear Egor,

Could you point the place with the description of your benchmarking pipeline, please? I am interested in such tests also!

egor-dolzhenko commented 3 years ago

Hello Sergey,

Unfortunately there are no publicly available documents describing the pipeline yet. But there is a good chance that we will put together a detailed description of the pipeline for one of our upcoming papers.

Overall, the pipeline checks that (a) the reference coordinates of the repeat are correctly annotated, (b) the repeat region has consistent read coverage, and (c) the read alignments indicate that no other variants are present at the region (except for the repeat and, possibly, some SNVs).

If you are working with relatively small repeat catalogs, you could also perform these checks visually using a tool like REViewer.

I hope this helps!

Best wishes, Egor