KaveIO / PhiK

Phi_K correlation analyzer library
Other
155 stars 28 forks source link

Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 9250352.5 - 11100412.399999999 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) #2

Closed atuliesbpl closed 4 years ago

atuliesbpl commented 4 years ago

While running the code " df.phik_matrix() " , the time taking is too high .

Any help in this ?

Please suggest

Below warnings for reference

Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 9250352.5 - 11100412.399999999 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i])))

F:\Anaconda\lib\site-packages\phik\data_quality.py:55: UserWarning: The number of unique values of variable address is very large: 8915. Are you sure this is not an interval variable? Analysis for pairs of variables including address might be slow. .format(col, df[col].nunique())) F:\Anaconda\lib\site-packages\phik\data_quality.py:55: UserWarning: The number of unique values of variable city is very large: 141. Are you sure this is not an interval variable? Analysis for pairs of variables including city might be slow. .format(col, df[col].nunique())) F:\Anaconda\lib\site-packages\phik\data_quality.py:55: UserWarning: The number of unique values of variable cuisines is very large: 1825. Are you sure this is not an interval variable? Analysis for pairs of variables including cuisines might be slow. .format(col, df[col].nunique())) F:\Anaconda\lib\site-packages\phik\data_quality.py:55: UserWarning: The number of unique values of variable locality is very large: 1208. Are you sure this is not an interval variable? Analysis for pairs of variables including locality might be slow. .format(col, df[col].nunique())) F:\Anaconda\lib\site-packages\phik\data_quality.py:55: UserWarning: The number of unique values of variable localityverbose is very large: 1265. Are you sure this is not an interval variable? Analysis for pairs of variables including localityverbose might be slow. .format(col, df[col].nunique())) F:\Anaconda\lib\site-packages\phik\data_quality.py:55: UserWarning: The number of unique values of variable restaurantname is very large: 7445. Are you sure this is not an interval variable? Analysis for pairs of variables including restaurantname might be slow. .format(col, df[col].nunique())) F:\Anaconda\lib\site-packages\phik\data_quality.py:71: UserWarning: Not enough unique value for variable switchtoordermenu for analysis 1. Dropping this column .format(col, df[col].nunique())) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 0.489999999999991 - 0.979999999999992 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 0.979999999999992 - 1.4699999999999929 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 37.599999999999994 - 55.9 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 55.9 - 74.19999999999999 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 74.19999999999999 - 92.49999999999999 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 110.8 - 129.1 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 129.1 - 147.4 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges -25.735000000000007 - -3.450000000000003 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 1.299999999999991 - 1.599999999999992 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 1.599999999999992 - 1.899999999999993 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 2.199999999999994 - 2.4999999999999947 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 2.4999999999999947 - 2.7999999999999963 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 3.099999999999997 - 3.3999999999999977 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 3.3999999999999977 - 3.6999999999999993 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 9250352.5 - 11100412.399999999 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 11100412.399999999 - 12950472.299999999 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\binning.py:68: UserWarning: Empty bin with bin-edges 12950472.299999999 - 14800532.2 warnings.warn('Empty bin with bin-edges {0:s} - {1:s}'.format(str(bin_edges[i-1]), str(bin_edges[i]))) F:\Anaconda\lib\site-packages\phik\bivariate.py:134: UserWarning: Too many unique values. Are you sure that interval variables are set correctly? warnings.warn('Too many unique values. Are you sure that interval variables are set correctly?')

mbaak commented 4 years ago

Hi, from the warnings it seems many of your variables have many bins (1000+), and that will make it slow. Are the variables interval variables? For those variables, can you reduce the number of bins per variable to, say, 40? (The default bin size is 1, I believe.) That should speed things up a lot.

mbaak commented 4 years ago

The features in the picture you sent, such as "address", "city", or "restaurant name", are all categorical features, where each data point can be unique, resulting in many bins (1000+). To calculate phi_k for two such features, the product results in multiple million bins, which will make phik slow to evaluate. Can you somehow reduce the number of unique items by cleverly grouping data points (e.g. cities -> state)? Up to a few hundred unique items should be fine. Else I'd suggest to leave those features out for now.

atuliesbpl commented 4 years ago

Thanks @mbaak i'll try accordingly