ACCLAB / DABEST-python

Data Analysis with Bootstrapped ESTimation
https://acclab.github.io/DABEST-python/
Apache License 2.0
339 stars 47 forks source link

Not enough unqiue values to generate halfviolin plot? #90

Closed BioinfoTongLI closed 4 years ago

BioinfoTongLI commented 4 years ago

Hi, @josesho,

I got the following error when there are only two unique values in a group of comparison

"""
Traceback (most recent call last):
  File "/home/tongli/miniconda3/envs/maars/lib/python3.7/site-packages/dabest/_classes.py", line 1295, in plot
    out = EffectSizeDataFramePlotter(self, **all_kwargs)
  File "/home/tongli/miniconda3/envs/maars/lib/python3.7/site-packages/dabest/plotter.py", line 488, in EffectSizeDataFramePlotter
    halfviolin(v, fill_color=fc, alpha=halfviolin_alpha)
  File "/home/tongli/miniconda3/envs/maars/lib/python3.7/site-packages/dabest/plot_tools.py", line 17, in halfviolin
    V = b.get_paths()[0].vertices
IndexError: list index out of range
"""

It seems that DABEST needs at least three points to generate a half violin plot, is that so?

Concretely, I have several groups to be compared. Most of them have more than 3 unique values. However, I have one group having a very flatten data point distribution (a lot of 0s and 1s). Nothing else.

Frankly, I'm not sure what is happening... Do you think the error is coming from here?

Best wishes Tong

josesho commented 4 years ago

It is likely that the resultant bootstrap curve is basically empty? Without access to your data (more specifically, dummy data with the same structure and Ns as your real data), I can't say much more.

BioinfoTongLI commented 4 years ago

First, I confirm there is nothing to do with the unique value.

Second, I obtained this wired plot, do you have any idea how is this generated? Screenshot from 2019-12-31 16-05-12

Third, I exported the dataset that gave the error. However, I was unable to reproduce the plot above... Here is the code I used:

import pandas as pd                                                             
import dabest                                                                   
import matplotlib.pyplot as plt                                                 

df = pd.read_pickle("dummy_clean.pkl")                                          
two_groups_unpaired = dabest.load(df, idx=list(df.columns), resamples=10000, ci=95)
two_groups_unpaired.cohens_d.plot()                                                                                                  
plt.show()

PS: it's the mad2-DMSO that generate the error of the plot

It seems that I have too much NaNs in the dataset that in turn generate this error (during resampling? It does not happen all the time. Purely guessing...)

/lib/python3.7/site-packages/dabest/_stats_tools/effsize.py:226: RuntimeWarning: invalid value encountered in double_scalars
  return M / divisor
/lib/python3.7/site-packages/dabest/_stats_tools/confint_2group_diff.py:211: RuntimeWarning: invalid value encountered in less
  prop_less_than_es = sum(B < effsize) / len(B)

Frankly, I have no idea what is happening...

josesho commented 4 years ago

In my hands (with dabest==0.2.8 and pandas==0.25.3), the code and the pickled dataset do produce a plot, . I would suggest making sure your python virtual environment has the above requirements.

Screen Shot 2020-01-02 at 14 26 29

Feel free to reopen the issue if you still have problems.