Oshlack / STRetch

Method for detecting STR expansions from short-read sequencing data
MIT License
62 stars 15 forks source link

overlapping STRs #56

Closed hrafehi closed 5 years ago

hrafehi commented 5 years ago

Hi,

How does STRetch deal with overlapping regions in the bedfile? I know you can't have two regions that are exactly the same, but if the second region intersects with the first region, will still run? Because I am trying to run an analysis this way and it seems to just ignore the second region.

Thanks

Haloom

hdashnow commented 5 years ago

Hi Haloom,

STRetch does not support multiple STR loci starting at the same position. It's okay if they overlap but have different starting positions. That said, overlapping loci may cause some strange results. If they have the same/similar repeat unit they may be competing with each other for reads, and so fail to reach significance.

Make sure to work through the steps for custom reference data: https://github.com/Oshlack/STRetch/wiki/Reference-Data

Maybe you could give me an example of the types of overlapping loci you are thinking of running? If I understand the use case better I may be able to give you a better idea if it will work.

Warm regards, Harriet

hrafehi commented 5 years ago

This is an example of the kind of regions I want to test. I have changed the starting base so they don't overlap. Should this be an issue? I also created the file manually since it is only a few regions

chr9 39370043 39370103 AAAAT 11.8 chr9 39370044 39370103 AAATT 11.8 chr9 39370045 39370095 AATTT 9.8

Also, can you please explain what you mean by competing for reads?

hdashnow commented 5 years ago

Hi @hrafehi,

The first one:

chr9 39370043   39370103    AAAAT   11.8

is a different STR repeat unit from the others so it should be okay.

For the last two:

chr9    39370044    39370103    AAATT   11.8
chr9    39370045    39370095    AATTT   9.8

The repeat units are actually the same, they are just the reverse complement of each other. So for any read with that repeat unit, STRetch will have to make a decision about which of these two loci to assign the read to. In that sense they will "compete" for the reads. So I advise just choosing one of these.

Warm regards, Harriet