Closed merveshin closed 4 years ago
Hi, I might make a overlap error here. Delating the line ' size = 'Potential',' will be better
alt.Chart(df).mark_point().encode( x='Age', y='Overall', size = 'Potential', )
I think it will be fine that we don't add names of the outliers in the stage of EDA. Adding axises labels and titles of graph might be more helpful
I am considering that instead of adding the names of the outliers on the graph, I will make a separate table to show the outliers and their attributes, clubs. We can show the wage distribution is highly skewed because of those names and I feel like looking at their attributes will be good.
I am considering that instead of adding the names of the outliers on the graph, I will make a separate table to show the outliers and their attributes, clubs. We can show the wage distribution is highly skewed because of those names and I feel like looking at their attributes will be good.
I think that sounds like a good option!
Hey @merveshin,
On a related note, I merged your last pull request but when I tried to open the EDA.ipynb
notebook the file seemed to be corrupted and wouldn't open. Does this happen for anyone else?
Yeah. After your merge, I noticed that it is broken I couldn’t understand why but now I am working on it and try to write the codes again.
Btw, I added the ‘BMI’ column and prepared the plots : age Vs wage, Overall Vs wage, BMI vs age
After completing this new version, I will send it to you.
On Sat, Jan 18, 2020 at 17:39 hwilliams10 notifications@github.com wrote:
Hey @merveshin https://github.com/merveshin,
On a related note, I merged your last pull request but when I tried to open the EDA.ipynb notebook the file seemed to be corrupted and wouldn't open. Does this happen for anyone else?
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/UBC-MDS/DSCI_522_G410/issues/6?email_source=notifications&email_token=AMPIZVVG6VKVTZSI2KCTUXDQ6OVOPA5CNFSM4KIUI742YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKGLUY#issuecomment-575956435, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMPIZVQCVJRYRPMNY77TXRDQ6OVOPANCNFSM4KIUI74Q .
I have just merged the PR from Mervin. I think it is working now.
So far I split the data set at the very beginning and made some wrangling only for training set. I just dropped the NAN's because there are too many, but maybe we can think another solution later. I couldn't add the names of the outliers on the distribution of the wage plot but I am working on it. Do you have any other suggestions about the graphs?