giesselmann / STRique

Nanopore raw signal repeat detection pipeline
MIT License
43 stars 10 forks source link

repeat number 0 #9

Open gmoneyomics opened 5 years ago

gmoneyomics commented 5 years ago

Hi,

I am trying to quantify repeat number in a large insertion of unknown (potentially varying) size. The alignment is very poor because this insert is not in the reference. When calling repeat number with STRique I am getting a lot of reads that have counts of 0 but when I look at the fast5 there is definitely a repeat present. Could this be a result of the poor mapping?

Count distribution: Screen Shot 2019-07-16 at 10 55 37 AM

example of read with count of 0: Screen Shot 2019-07-16 at 11 19 54 AM

giesselmann commented 5 years ago

Hi,

The repeat count 0 is given for reads where for instance the signal alignment of prefix and suffix failed. These reads can be filtered out, I updated our documentation accordingly. The quality of the sequence alignment is not impacting the repeat counting. On our targets we observe, that for most reads either the prefix or suffix maps with large soft-clippings on one read side. As long as a read can be located to span a region of interest, STRique will try to evaluate it.

The count distribution is -given the length of the expansion- from my experience still very nice. The example signal plot shows a shift of the beginning of the repeat signal. The longer the repeat, the harder the normalization for us, as mean and stdv are impacted by the monotonic signal. For the given signal I would assume, that the mapping of the prefix signal didn't work.

gmoneyomics commented 5 years ago

Thank you! I've looked at the signal for a few more of these and see that some don't extend through both flanking regions so I will just filter out the 0s.