Danc2050 / CS545-Final-Project

Final project for CS545
0 stars 1 forks source link

Cleanup bayes.py #3

Open Danc2050 opened 4 years ago

Danc2050 commented 4 years ago

Make the def main more modular and readable.

Danc2050 commented 4 years ago

Also optimize bayes:

Change nb function to use scipy.norm.pdf. This preserves precision and is vectorized for numpy.

Also change out normal logs to use scipy.norm.logpdf that takes the logarithm (base e).

Thanks to Dalton for these suggestions.

daltonbohning commented 4 years ago

Another suggestions I have:

for i in range(len(bad_stddev)):
        if bad_stddev[i] == 0:
           #print(bad_stddev[i])                                                                                                    
           bad_stddev[i] = min_std_dev # Set any 0 std dev to min_std_dev to avoid divide by zero error

This can also be accomplished with: bad_stddev[bad_stddev == 0] = min_std_dev Or: bad_stddev= np.clip(bad_stddev, min_std_dev, None)

Danc2050 commented 4 years ago

Just learned about that == trick. Even sweeter, np.clip! Very cool.

I added the clip implementation. I also changed the ratios and put in comments about how I split the data (a 3:1 ratio) and removed some of the test code. Basically the model is averaging around 78% for accuracy.

daltonbohning commented 4 years ago

Yeah, I happened to stumble across that in the docs. I do think clip is more efficient, as the == deals with array slicing, etc., but I'm not entirely sure. Glad you appreciate it!

daltonbohning commented 4 years ago

Another thing I just noticed... np.float128 is implementation independent, and therefore not compatible with every system. 'longdouble' will be cast down from 128 in this case: dtype='longdouble'

Danc2050 commented 4 years ago

Ha, your intuition is probably right. It seems like the general principle is numpy is faster since it written in the C and/or C++.

Oh interesting! That is odd that is not the suggested dtype in their documentation. I feel like that would be a good suggestion in their GitHub issues form.

Thanks. I'm trying to finish up my programming assignment tonight (ha.ha..ha..), but I'll try and get to the first round of suggestions you made and then add in the longdouble next.

daltonbohning commented 4 years ago

Nice! From this page of the docs (https://docs.scipy.org/doc/numpy/user/basics.types.html): "MSVC (standard for Windows builds) makes long double identical to double (64 bits). NumPy makes the compiler’s long double available as np.longdouble." Meaning, most windows systems only support 64 bits. It looks like you can use dtype=np.longdouble instead of dtype='longdouble'

I happened upon this because my windows desktop yelled at me about np.float128 not being defined :(