giesselmann / STRique

Nanopore raw signal repeat detection pipeline
MIT License
43 stars 10 forks source link

repeat_contig.tsv file #30

Closed vmscmams closed 2 years ago

vmscmams commented 2 years ago

Hello! I am having problems with the creation of the repeat_contig.tsv file. I still don't understand how you defined the prefix and suffix sequences. I tried to align the sequences provided by you in the repeat_contig.tsv file with the read in the c9orf72.sam file without good results. Furthermore, if I count the number of GGCCCC repetitions manually, I have a different number than the one provided by the tool. I ask all this, since I have an error probably caused by the repeat_contig.tsv file that I built. The file generated by you with my sam file works fine.

thanks for your help

giesselmann commented 2 years ago

Hi, prefix and suffix are the genomic sequence on forward strand up- and downstream of the repeat. In the repo config, this is I think from hg19, the 150bp before and after the GGCCCC of c9orf72. The alignment of the read sequence is difficult because the basecaller is struggeling to call the repeat correctly. In the paper, we have a supplementary figure showing, that the repeat length is systematically underestimated in sequence space. I hope this helps, otherwise if you don't want to share the config publically, feel free to send it to me via Mail. Pay

vmscmams commented 2 years ago

Hi, thanks for your response!

I sent you an e-mail a few minutes ago with the subject "repeat_contig.tsv file # 30"