christophertbrown / iRep

scripts for estimating bacteria replication rates based on population genome copy number variation
MIT License
68 stars 9 forks source link

Remove outlier windows #15

Closed wrshoemaker closed 6 years ago

wrshoemaker commented 6 years ago

Hi Chris,

Thank you for your previous help. I got iRep running on my paired-end aligned SAM files. I had a question regarding the fit of the linear regression for GC-content vs. coverage. There are a few outlier windows with extremely high coverage (~5x the mean) that seem to be ruining the fit of the regression in the attached file. Do you know of an option or quick fix in the Python code I can do to remove those high coverage windows from the analysis?

Best, Will

Sample_L1B4-100.iRep.pdf

christophertbrown commented 6 years ago

Hi Will,

Glad that you were able to get iRep to run with your paired-read mappings.

Thanks for including the attached output along with your question. I'm definitely concerned about those high coverage values. They also seem to be messing up the iRep r^2 on the last page of the output. The value is just above the minimum 0.90 for considering it an accurate result.

Do you know why that part of the region has such high coverage?

I would recommend removing the high-coverage region of the genome and re-running.

Chris

On Sep 27, 2017, at 3:08 PM, Will Shoemaker notifications@github.com wrote:

Hi Chris,

Thank you for your previous help. I got iRep running on my paired-end aligned SAM files. I had a question regarding the fit of the linear regression for GC-content vs. coverage. There are a few outlier windows with extremely high coverage (~5x the mean) that seem to be ruining the fit of the regression in the attached file. Do you know of an option or quick fix in the Python code I can do to remove those high coverage windows from the analysis?

Best, Will

Sample_L1B4-100.iRep.pdf

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

wrshoemaker commented 6 years ago

Hi Chris,

Anyway you could consider adding an argument for a window coverage cutoff to the next version?

christophertbrown commented 6 years ago

Hi Will,

Good question. There actually already is a check built into iRep that is supposed to handle these situations. iRep automatically rejects any window with coverage >8x the median. In your case, the difference between the median and the high coverage windows was ~4x, so the windows were not excluded. In a future update I will make this threshold an option, so you could lower it when necessary.

Chris

On Sep 28, 2017, at 4:59 PM, Will Shoemaker notifications@github.com wrote:

Hi Chris,

Anyway you could consider adding an argument for a window coverage cutoff to the next version?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

wrshoemaker commented 6 years ago

Awesome! Thanks Chris

-Will

christophertbrown commented 6 years ago

Hi Will,

I added an option to control the window coverage inclusion threshold (-c). Hopefully this helps in the future for situations like yours.

You should be able to pip install iRep --upgrade to get the latest version.

I also made a couple other changes to the organization of the iRep code, so please do let me know if you run into any trouble (I think everything should be working fine).

Chris

On Oct 2, 2017, at 7:07 AM, Will Shoemaker notifications@github.com wrote:

Awesome! Thanks Chris

-Will

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.