christophertbrown / iRep

scripts for estimating bacteria replication rates based on population genome copy number variation
MIT License
68 stars 9 forks source link

iRep bias for lower coverage data #20

Open lch14forever opened 6 years ago

lch14forever commented 6 years ago

Thanks for the great tool! We recently observed a intriguing behavior using iRep. When we downsample the mapping file, the iRep values calculated tend to increase. We wonder it is because the window size is too small for lower coverage. Would it be possible to adjust the window size?

christophertbrown commented 6 years ago

Hi Chenhao Li,

Interesting question.

I actually did notice something similar at one point and looked into it. With the datasets used in the iRep paper for benchmarking, I compared iRep values at full coverage to iRep values when coverage was subset to 10x. Attached are the results. There was no strong relationship between change in coverage and change in iRep. Importantly, it was quite rare to find a change in iRep greater than 0.15, which was roughly determined to be the error within individual iRep values.

We considered different window sizes when initially testing iRep, and found that 5 Kbp windows provided the best results (see Supplementary Figure 2 in the iRep paper), so I don’t think changing the window size will improve things. Perhaps having a better way of modeling error with low coverage would improve results.

Maybe your dataset is different? If so, it would definitely be worth looking into more.

Best,

Chris

On Apr 5, 2018, at 9:25 PM, Chenhao Li notifications@github.com wrote:

Thanks for the great tool! We recently observed a intiguing behavior using iRep. When we downsample the mapping file, the iRep values calculated tend to increase. We wonder it is because the window size is too small for lower coverage. Would it be possible to adjust the window size?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.