Illumina / ExpansionHunter

A tool for estimating repeat sizes
Other
174 stars 53 forks source link

I would like to use in plant, de novo identify the repeat by #101

Open wangnan9394 opened 4 years ago

wangnan9394 commented 4 years ago

Hi, I would like to use this model in a plant genome and WGS data. The Variant catalog file is a required input, I think there is a hypothetical strategy to achieve this goal, step1: identify the repeat by tools like Tandem Repeats Finder, and the output is a .txt contain repeats. step2: extra the basic structure by Python Scripts, eg: CGGCGGCGG --> (CGG) step3: modify the format in a JSON array: [ { "LocusId": "ref_rep1", "LocusStructure": "(CAG)", "ReferenceRegion": "1:462-522", "VariantType": "Repeat" }, { "LocusId": "ref_rep2", "LocusStructure": "(CAGT)CGTTG(CGG)", "ReferenceRegion": "1:1593-1624", "VariantType": ["Repeat", "Repeat"] }, { "LocusId": "ref_rep3", "LocusStructure": "(TGGGCAGCAGTA)*", "ReferenceRegion": "1:4731-4910", "VariantType": "Repeat" }, ] Sometimes, it is hard to obtain annotation in a plant genome. It is easy to gain the repeats from a reference genome using tools like Tandem Repeats Finder, can this method be used for identification and keep as much information as possible? and is there any way to make this Variant catalog file from reference genome(.fasta)? Thanking you!

egor-dolzhenko commented 3 years ago

Thanks for your interest in using ExpansionHunter! We are working on tools for annotating repeats (i.e. creating variant catalogs) and assessing the accuracy of existing repeat annotations. We should be able to share some annotations tools soon.

Could you please describe your dataset (what plant species you are working with, how large is your dataset, etc.)? Please feel free to email me directly (edolzhenko@illumina.com).

Best wishes, Egor

wangnan9394 commented 3 years ago

Thanks for your reply. I look forward to the good news! I think ExpansionHunter is a good tools. And I focus on the citrus genus, and the average genome size about 350Mb. There are lot of genome version in citrus genus. In future, will be one version need the matched variant catalogs?

在 2020-07-07 06:15:23,"Egor Dolzhenko" notifications@github.com 写道:

Thanks for your interest in using ExpansionHunter! We are working on tools for annotating repeats (i.e. creating variant catalogs) and assessing the accuracy of existing repeat annotations. We should be able to share some annotations tools soon.

Could you please describe your dataset (what plant species you are working with, how large is your dataset, etc.)? Please feel free to email me directly (edolzhenko@illumina.com).

Best wishes, Egor

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

egor-dolzhenko commented 3 years ago

This makes sense, thank you for explaining. And yes, variant catalogs are tied to a specific genome version. So, hopefully, it is an option for you to use just one genome version or, alternatively, you could create separate variant catalogs for a few genomes. Could you please check back in about a month? Hopefully, by then we will have some tools to share with you.