dputhier / pygtftk

A python package and a set of shell commands to handle GTF files
GNU General Public License v3.0
45 stars 6 forks source link

peak_anno: RuntimeWarning: divide by zero encountered in log return log(self._cdf(x, *args)) #58

Closed dputhier closed 5 years ago

dputhier commented 5 years ago

Encountered when running:

  gtftk get_example -d mini_real | gtftk peak_anno -p ENCFF112BHN_H3K4me3_K562_sub.bed -c hg38.genome -u 1500 -d 1500 -D  -if example_pa_01.pdf -k 5 -V 3

       ...
 |-- 20:51:43-INFO-peak_anno : Computing log(p-val) for a Neg Binom with mean >= var ; var was set to mean + 1E-4
/Users/puthier/miniconda3/envs/pygtftk/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:893: RuntimeWarning: divide by zero encountered in log
  return log(self._cdf(x, *args))

The computation ended with a result but the message seems to indicate that something turned wrong.

qferre commented 5 years ago

Could you tell me which region this was for ? If you have high verbosity, you should have regularly lines such as :

|-- 11:17:58-INFO-peak_anno : Processing intergenic regions

... telling who which region is currently being worked on.

dputhier commented 5 years ago

Log as attached file

log.txt

gtftk get_example -d mini_real -f '*'

gtftk get_example -d mini_real | gtftk peak_anno -m gene_biotype -p ENCFF112BHN_H3K4me3_K562_sub.bed -c hg38.genome -D -n -if example_pa_02.pdf -k 8 -V 3

qferre commented 5 years ago

I think this is due to scipy's cdf log not handling certain values : https://github.com/scipy/scipy/issues/2139

I have seen this too while testing manually : I have a '-inf' result appearing when numbers are too large.

qferre commented 5 years ago

I confirm this is due to scipy's approximation. This fortunately does not affect the result since np.log(-np.inf) returns 0 by convention.

I have silenced the relevant warnings in e0753f2c6d62d1a60b3af83c0ec14f92d2607606.