bcgsc / straglr

Tandem repeat expansion detection or genotyping from long-read alignments
Other
50 stars 9 forks source link

Question about the inconsistent motif between loci.bed and output.bed #22

Closed HLHsieh closed 8 months ago

HLHsieh commented 1 year ago

Hi there,

I was trying to use Straglr in genotype mode. I ran the command as follows python straglr.py myseq.sorted.bam genome.fa myseq_straglr --loci loci.bed --nprocs 4

Here is my loci.bed:

chr9    27573484        27573546        CCCCGG

However, I got the following output.bed

#chrom  start   end     repeat_unit     allele1:size    allele1:copy_number     allele1:support allele2:size    allele2:copy_number     allele2:support
chr9    27573484        27573546        GCCCCG  103.80000000000001      17.3    106     -       -       -

I am wondering why repeat unit are different, and whether there is any solution to detect my target pattern (CCCCGG) on Straglr.

Any comments would be appreciated.

Best, Hsin

readmanchiu commented 1 year ago

Hi @HLHsieh The target motif CCCCGG and the reported motif GCCCCG is essentially the same except the order of the bases. The two are the same if rearrangement of the bases produce the same sequence.

Let me know if you have further questions and thanks for trying Straglr!

HLHsieh commented 1 year ago

Thanks for your prompt response!