EHWUSF / HS68_2018_Project_1

0 stars 9 forks source link

- Initial aggregation and preprocess module: #32

Open nitieaj opened 5 years ago

nitieaj commented 5 years ago

This module takes the column and class label, groups the column by descriptive statistics and plots associated graphs.

nitieaj commented 5 years ago

Yeah, youre right. Ill look into adding that component .

On Thu, Jul 26, 2018 at 12:04 PM, Nikita Thomas notifications@github.com wrote:

@NikitaThomas commented on this pull request.

In hs628_tools/Preprocess_module.py https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_EHWUSF_HS68-5F2018-5FProject-5F1_pull_32-23discussion-5Fr205569506&d=DwMCaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=0cV_TVOWlRVep9Kq29fJkEZ32-TILX0rx9zYhsQYYog&m=3fZ8LvrpgOvOpQDSEildYnhz_VXywjVQjrXy_4fi618&s=TWuGW3eXPEZYREz8tVeK0WSLppC6GAtF1NwlWLoq8Eg&e= :

+# In[245]: + + +def summarystat(listd):

  • for i in range(len(listd)):
  • avg=np.mean(listd[i])
  • median=np.median(listd[i])
  • stdev=np.std(listd[i])
  • plt.hist(listd[i], normed=True, bins=20)
  • plt.title('The "Normal" Distribution with Mean & St. Devs.')
  • plt.xlabel("Variables"); plt.ylabel("Frequency")
  • plt.grid(True)
  • plt.rc('grid', linestyle="dashed", color='grey')
  • plt.show()
  • print(i,"Mean = {0}".format(avg),"Median = {0}".format(median),"stdev = {0}".format(stdev))
  • return avg,median,stdev

I think it would also be helpful to have the min/max values of each column so the user knows the range of each of their variables.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_EHWUSF_HS68-5F2018-5FProject-5F1_pull_32-23pullrequestreview-2D140867839&d=DwMCaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=0cV_TVOWlRVep9Kq29fJkEZ32-TILX0rx9zYhsQYYog&m=3fZ8LvrpgOvOpQDSEildYnhz_VXywjVQjrXy_4fi618&s=DIDAvWMTwm_LjZQZbSauD3RY4iei5pzN-jmx5SCK7ME&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_Ag-5FPINxqJ6nAtxW7l1TE13pjxZJoGI-2D1ks5uKhK7gaJpZM4VXYic&d=DwMCaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=0cV_TVOWlRVep9Kq29fJkEZ32-TILX0rx9zYhsQYYog&m=3fZ8LvrpgOvOpQDSEildYnhz_VXywjVQjrXy_4fi618&s=4zOM-ZOqp6sbh7mbQCwnByJjNVuuglI2H7JzWzW6Z0Q&e= .

nitieaj commented 5 years ago

typpe is refering to the categorical variable in the dataset "df"

On Mon, Jul 23, 2018 at 3:43 PM, haleyhowe notifications@github.com wrote:

@haleyhowe commented on this pull request.

In hs628_tools/Preprocess_module.py https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_EHWUSF_HS68-5F2018-5FProject-5F1_pull_32-23discussion-5Fr204575884&d=DwMCaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=0cV_TVOWlRVep9Kq29fJkEZ32-TILX0rx9zYhsQYYog&m=kFDhEnkQq9uD6iSOJnO_09OjeTKjkoF8YpROMXAT_rY&s=7OPPUdXFU-qDlZIu-DihW4jHVHF_A3zRfG5iyn_LkG0&e= :

+ + +# In[239]: + + +df1["HNR_status"] =list(zip(df1.HNR, df1.status)) +df1["HNR_status"].head(10) + + +# Build a tuple of the column and class label + +# In[240]: + + + +def coltuple(df,col,typpe):

What is the parameter "typpe" referring to ?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_EHWUSF_HS68-5F2018-5FProject-5F1_pull_32-23pullrequestreview-2D139680900&d=DwMCaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=0cV_TVOWlRVep9Kq29fJkEZ32-TILX0rx9zYhsQYYog&m=kFDhEnkQq9uD6iSOJnO_09OjeTKjkoF8YpROMXAT_rY&s=mEYFzs0EqxCMp76KMYMRF1v9zBgwKNOYk81pD5OO57Q&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_Ag-5FPIEt5UExkL9QVD9Lb9nVp3P-5FX-5FEmJks5uJlGMgaJpZM4VXYic&d=DwMCaQ&c=qgVugHHq3rzouXkEXdxBNQ&r=0cV_TVOWlRVep9Kq29fJkEZ32-TILX0rx9zYhsQYYog&m=kFDhEnkQq9uD6iSOJnO_09OjeTKjkoF8YpROMXAT_rY&s=tlRzT9ldBxpvRCzNohoEMKlu4pepTfbV67AKbyrJAC0&e= .

EHWUSF commented 5 years ago

Having 4 different versions of the file still being present in this PR makes it a nuisance to even figure out which file really represents the current state of the PR, and of course it could not be merged like this. So please remove the superfluous files. Also, by this time the work-in-progress label should be removed (or at the very least some explanation given about why it is remaining).

The 4 different file versions also means that comments here don't necessarily get marked as outdated even if you've already changed the code they referred to, and the git history for the final file will only be a subset of the actual history. It could be possible to clean that up by basically recreating the sequence of development, applying all the stepwise changes to a single file and making corresponding commits as you go (but given that we are tight on time now I won't recommend that here),

It looks to me like Preprocess3_module.py is the name of the file used in the most recent commit, so that's the one I'll review.

EHWUSF commented 5 years ago

This PR description is a bit too terse (and should also include a link back to the Issue it was meant to address). Please clarify the description.