dputhier / pygtftk

A python package and a set of shell commands to handle GTF files
GNU General Public License v3.0
45 stars 6 forks source link

[ologram] P-values equal to zero when nb_intersections_true and nb_intersections_esperance_shuffled are equal to zero. #72

Closed dputhier closed 5 years ago

dputhier commented 5 years ago

I have added a -j argument to ologram that controls the ways bars are sorted in the diagram (dev branch). I realized that the results regarding some feature are weirds. This is true especially for the example from the doc where we look at gene_biotype(see attach files). 00_ologram_stats.txt

Here, clearly TEC and others should not have a p-value of zero.

gtftk get_example -q -d mini_real -f '*' 
gtftk get_example -d mini_real | gtftk ologram -m gene_biotype -p ENCFF112BHN_H3K4me3_K562_sub.bed -c hg38.genome -D -n  -if example_pa_02.pdf -V 1 -j  summed_bp_overlaps_pvalue -K ologram_output

image

qferre commented 5 years ago

It's not a bug, it's a feature.TM

The reasons those p-values are zero is because when fitting a NB is impossible, an empirical p-value is returned.

Maybe I could drop the empirical p-value altogether ? It may cause more confusion than it's worth. Then I could maybe just return a np.nan

qferre commented 5 years ago

I dropped the empirical p-values. Should now return NaNs.

dputhier commented 5 years ago

Cool. You should add the corresponding sha to your comments. It makes it easier to check the associated commit.

dputhier commented 5 years ago

In develop ?

qferre commented 5 years ago

It is now. Commit is 3872d5e9be8faa54e51db4a5c14202f591f75640

I had not put the commit sha in my previous number because I had not commited them yet, sorry.

qferre commented 5 years ago

To prevent pandas from panicking when printing the dataframe into a file, I return '-1' instead of NaN. The graph code will translate any '-1' p-value into a 'NA' for display. Closing.