UBC-MDS / DSCI_522_G410

Group repo for DSCI522 (group 410)
MIT License
0 stars 4 forks source link

EDA process #6

Closed merveshin closed 4 years ago

merveshin commented 4 years ago

So far I split the data set at the very beginning and made some wrangling only for training set. I just dropped the NAN's because there are too many, but maybe we can think another solution later. I couldn't add the names of the outliers on the distribution of the wage plot but I am working on it. Do you have any other suggestions about the graphs?

mglu123 commented 4 years ago

Hi, I might make a overlap error here. Delating the line ' size = 'Potential',' will be better

alt.Chart(df).mark_point().encode( x='Age', y='Overall', size = 'Potential', )

I think it will be fine that we don't add names of the outliers in the stage of EDA. Adding axises labels and titles of graph might be more helpful

merveshin commented 4 years ago

I am considering that instead of adding the names of the outliers on the graph, I will make a separate table to show the outliers and their attributes, clubs. We can show the wage distribution is highly skewed because of those names and I feel like looking at their attributes will be good.

hwilliams10 commented 4 years ago

I am considering that instead of adding the names of the outliers on the graph, I will make a separate table to show the outliers and their attributes, clubs. We can show the wage distribution is highly skewed because of those names and I feel like looking at their attributes will be good.

I think that sounds like a good option!

hwilliams10 commented 4 years ago

Hey @merveshin,

On a related note, I merged your last pull request but when I tried to open the EDA.ipynb notebook the file seemed to be corrupted and wouldn't open. Does this happen for anyone else?

merveshin commented 4 years ago

Yeah. After your merge, I noticed that it is broken I couldn’t understand why but now I am working on it and try to write the codes again.

Btw, I added the ‘BMI’ column and prepared the plots : age Vs wage, Overall Vs wage, BMI vs age

After completing this new version, I will send it to you.

On Sat, Jan 18, 2020 at 17:39 hwilliams10 notifications@github.com wrote:

Hey @merveshin https://github.com/merveshin,

On a related note, I merged your last pull request but when I tried to open the EDA.ipynb notebook the file seemed to be corrupted and wouldn't open. Does this happen for anyone else?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/UBC-MDS/DSCI_522_G410/issues/6?email_source=notifications&email_token=AMPIZVVG6VKVTZSI2KCTUXDQ6OVOPA5CNFSM4KIUI742YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGLUY#issuecomment-575956435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMPIZVQCVJRYRPMNY77TXRDQ6OVOPANCNFSM4KIUI74Q .

eyrakas commented 4 years ago

I have just merged the PR from Mervin. I think it is working now.