bcgsc / straglr

Tandem repeat expansion detection or genotyping from long-read alignments
Other
69 stars 8 forks source link

Issue in generating vcf file #52

Open MemoonaRasheed opened 3 weeks ago

MemoonaRasheed commented 3 weeks ago

I am running straglr/v1.5.2 using genome scan mode and I am not getting .vcf file (its only generating .tsv and .bed). The additional filters that I used were --min_support 4 --min_ins_size 10 --max_str_len 3000. The log file is empty so I don't know what went wrong. Can you please help with this?

readmanchiu commented 3 weeks ago

Thanks for your interest in Straglr. This is strange you would only get the .tsv and .bed. The parameters are kind of wild though. --max_str_len indicates the maximum motif length. You would include some very long VNTRs if you set it this high. --min_ins_size of 10 is also bit low, there would be a lot of short insertions that may need to be dealt with. Why don't you try with some conservative parameters like --min_ins_size 100 --max_str_len 20 --min_support 4 to see if you manage to get the vcf output first. You can also first try the test data in the test directory (https://github.com/bcgsc/straglr/tree/master/test) to see if you get the expected outputs. Also I would advise not to attempt the centromeric or regions composed of long and dense repeats. You should generate a bed for those regions (an example is shown in the README) and pass it to the --exclude parameter.