christophertbrown / iRep

scripts for estimating bacteria replication rates based on population genome copy number variation
MIT License
68 stars 9 forks source link

No results, numpy RuntimeWarning #13

Closed wrshoemaker closed 6 years ago

wrshoemaker commented 6 years ago

Hello,

I really liked iRep and have used it before. I re-installed it a few weeks ago got some weird results that I think might be caused by numpy code in the iRep package.

When I run iRep on my sorted sam file I get the following output and the following numpy warnings:

~/iRep/lib/python3.6/site-packages/numpy/lib/function_base.py:1110: RuntimeWarning: Mean of empty slice. avg = a.mean(axis)

and

~/iRep/lib/python3.6/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)

I think this might have something to do with the sliding window calculation. Have you seen something similar and do you know how to work around it?

christophertbrown commented 6 years ago

Hi Will,

Did you do this test using the same SAM file that you had used previously? From the error and the output it looks like there is an issue with the mapping - the genome you are using has zero coverage.

Chris

On Sep 21, 2017, at 5:50 PM, Will Shoemaker notifications@github.com wrote:

Hello,

I really liked iRep and have used it before. I re-installed it a few weeks ago got some weird results that I think might be caused by numpy code in the iRep package.

When I run iRep on my sorted sam file I get the following output and the following numpy warnings:

~/iRep/lib/python3.6/site-packages/numpy/lib/function_base.py:1110: RuntimeWarning: Mean of empty slice. avg = a.mean(axis)

and

~/iRep/lib/python3.6/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount)

I think this might have something to do with the sliding window calculation. Have you seen something similar and do you know how to work around it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

wrshoemaker commented 6 years ago

Hi Chris,

I'm not using the same SAM file, but this SAM file has an average coverage of ~500. I also sorted it using the samtools sort command and checked to make sure that the reference sequence names in the SAM file match the sequence names in the FASTA file.

wrshoemaker commented 6 years ago

I can send a portion of the SAM file if that would be useful. The entire file is ~ 7 gb.

christophertbrown commented 6 years ago

Hi Will,

Interesting. Can you please confirm that the read mapping was done in paired-end mode (e.g. using -1 and -2 with Bowtie2)?

Other than that, I'm not sure what the issue could be. It would be helpful if you could send the first few lines of your SAM mapping file.

Sorry for the inconvenience.

Chris

On Sep 22, 2017, at 11:09 AM, Will Shoemaker notifications@github.com wrote:

I can send a portion of the SAM file if that would be useful. The entire file is ~ 7 gb.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

On Sep 22, 2017, at 10:06 AM, Will Shoemaker notifications@github.com wrote:

Hi Chris,

I'm not using the same SAM file, but this SAM file has an average coverage of ~500. I also sorted it using the samtools sort command and checked to make sure that the reference sequence names in the SAM file match the sequence names in the FASTA file.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

wrshoemaker commented 6 years ago

Hi Chris,

I mapped the reads to the reference using breseq, which treats the reads of single-end. I wasn't aware that the reads needed to be paired-end. I'll re-map the reads to the reference, re-run iRep, and see if there's still a problem.

Thanks for your help, Will

christophertbrown commented 6 years ago

Hi Will,

Hopefully that will fix it for you.

iRep does look at paired read mappings to make sure that both reads in each pair are accurately mapped to the genome. This makes it much less likely to have off target read mappings that could inflate coverage values over some parts of the gnome, and skew results.

Chris

On Sep 22, 2017, at 2:31 PM, Will Shoemaker notifications@github.com wrote:

Hi Chris,

I mapped the reads to the reference using breseq, which treats the reads of single-end. I wasn't aware that the reads needed to be paired-end. I'll re-map the reads to the reference, re-run iRep, and see if there's still a problem.

Thanks for your help, Will

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

wrshoemaker commented 6 years ago

Thanks Chris,

That solved the problem. Glad to know there's that extra step in iRep. I'll go ahead and close this "issue".