Illumina / ExpansionHunter

A tool for estimating repeat sizes
Other
177 stars 51 forks source link

Question about create variat #183

Open wjzzq opened 10 months ago

wjzzq commented 10 months ago

Dear Egor,

I want to use ExpansionHunter to identify tandem repeats variants on a plant genome. I used TRF results to generate variant-catalog as below:

[ { "LocusId": "Cor-Chr1_273_295", "LocusStructure": "(AT)", "ReferenceRegion": "Cor-Chr1:273-295", "VariantId": "Cor-Chr1_273_295", "VariantType": "Repeat" }, { "LocusId": "Cor-Chr1_3195_3215", "LocusStructure": "(AG)", "ReferenceRegion": "Cor-Chr1:3195-3215", "VariantId": "Cor-Chr1_3195_3215", "VariantType": "Repeat" }

}

When I run ExpansionHunter, I get the following error, I want to ask if the variant-catalog file format I generated is wrong?

ExpansionHunter --reads ZD31.sorted.bam \ --reference Cbp_pan.fasta \ --variant-catalog ../Expansionhunter/Cbp_STR.json \ --output-prefix ../Expansionhunter
2023-11-29T01:30:34,[Starting ExpansionHunter v5.0.0] 2023-11-29T01:30:34,[Analyzing sample ZD31.sorted] 2023-11-29T01:30:34,[Initializing reference Cbp_pan.fasta] 2023-11-29T01:30:34,[Loading variant catalog from disk ../Expansionhunter/Cbp_STR.json] 2023-11-29T01:30:35,[Unexpected range format: Cor-Chr1:273-295]

Best whishes!

Zhiqin

andreasssh commented 10 months ago

I'm not Egor, but I bet the problem is that the chromosome name contains a "-" which is used to split the string to get the range. So, don't use "-" and ":" in chromosome names and try again. Secondly, for the locus structure you might want to add or + after parentheses, e.g.: (AT)

If helpful, I also have a script for converting TRF output file (DAT) to an EH catalogue file, available here: https://gitlab.com/andreassh/trf2strcat

wjzzq commented 10 months ago

I'm not Egor, but I bet the problem is that the chromosome name contains a "-" which is used to split the string to get the range. So, don't use "-" and ":" in chromosome names and try again. Secondly, for the locus structure you might want to add or + after parentheses, e.g.: (AT)

If helpful, I also have a script for converting TRF output file (DAT) to an EH catalogue file, available here: https://gitlab.com/andreassh/trf2strcat

Thank you very much! Based on your suggestion, I successfully ran ExpansionHunter.