JasonKessler / Scattertext-PyData

Notebooks for the Seattle PyData 2017 talk on Scattertext
141 stars 52 forks source link

Harmonic mean error #3

Open gskarp opened 3 years ago

gskarp commented 3 years ago

Hello, I use the Jupyter Notebook with my own data. When running the following part of the code

def normcdf(x):
    return norm.cdf(x, x.mean(), x.std())
term_freq_df['eight_precision_normcdf'] = normcdf(term_freq_df['eight_precision'])
term_freq_df['eight_freq_pct_normcdf'] = normcdf(term_freq_df['eight_freq_pct'])
term_freq_df['eight_scaled_f_score'] = hmean([term_freq_df['eight_precision_normcdf'], term_freq_df['eight_freq_pct_normcdf']])
term_freq_df.sort_values(by='eight_scaled_f_score', ascending=False).iloc[:10]

I get the following error

image

The column categories run from 'zero' to 'eight'. Any suggestion to overcome this problem is welcome

JasonKessler commented 3 years ago

Impossible to know what's going on without the data. I'd bet you have a very low value in one which is getting marked as 0 by normcdf due to floating point precision issues.