peak_anno: negbin_fit.py:86: RuntimeWarning: invalid value encountered in double_scalars return np.sqrt(chi2 / (n*(min(crosstab.shape)-1)))

dputhier / pygtftk

A python package and a set of shell commands to handle GTF files

GNU General Public License v3.0

45 stars 6 forks source link

peak_anno: negbin_fit.py:86: RuntimeWarning: invalid value encountered in double_scalars return np.sqrt(chi2 / (n*(min(crosstab.shape)-1))) #59

Closed dputhier closed 5 years ago

dputhier commented 5 years ago

encountered in:

   gtftk get_example -d mini_real | gtftk peak_anno -m gene_biotype -p ENCFF112BHN_H3K4me3_K562_sub.bed -c hg38.genome -D -n  -if example_pa_02.pdf

qferre commented 5 years ago

I think this issue, like the previous one you posted, is due to cases where there are no peaks or very few peaks within a feature set, which happens to us regularly with this "simple" testing data (for example, no peaks in 'exons').

They are not fatal to the computation, and will likely only affect Negative Binomial fitting. I am working on it.

qferre commented 5 years ago

Could you tell me which region this was for ? If you have high verbosity, you should have regularly lines such as :

|-- 11:17:58-INFO-peak_anno : Processing intergenic regions

... telling who which region is currently being worked on.

qferre commented 5 years ago

Tried it myself. It was 'unprocessed_pseudogene'. Looking into this.

qferre commented 5 years ago

It was due to the crosstab of the histogram (real distribution vs theoretical NB) having only 1 line if the only values observed were 0 or 1. Fixed in 8e4613fdabc6027d7d7443d0693bd4ec2c6c0e1c.