UBC-MDS / DSCI522_group_12

MIT License
0 stars 5 forks source link

EDA code #22

Open abaghela opened 3 years ago

abaghela commented 3 years ago

Great download function. The plots are not visible in the .md file. A better idea of the variables would be helpful to understand the EDA, like pay_1, pay_2, etc. How does selectKBest work? How will the feature importance play into model building? Will you remove unimportant variables?

HazelJJJ commented 3 years ago

We updated our EDA. I also put a link in our report to the EDA script folder. Thank you.

larahabashy commented 3 years ago
  1. The plots are not visible in the .md file.

The plots are visible for me on my computer. For milestone 2, I addressed this comment by updating the output format in the .Rmd file to output: github_document (as per tiff's breast cancer reference project). I'm not sure why plots are not visible for TA's.

  1. A better idea of the variables would be helpful to understand the EDA, like pay_1, pay_2, etc.

I addressed this comment in milestone 2 by providing more explanation behind variable names in the .Rmd.

The history of past payment over 6 months (April-Sept 2005) is given by the features pay_0, pay_2, pay_3, …, pay_6 which take on numeric values representing the delay of the repayment in months, i.e. pay_0=1 means a client’s payment is 1 month late in September 2005, pay_6=2 would then mean a clients payment was 2 months past payment in April 2005. A value of -1 is assigned for payments made on time in a given month. Note that the sequence is missing pay_1. We rename pay_0 to pay_1 for consistency and simplicity. We notice some of those features take on a value of -2, which is undocumented and therefore, we encode as 0.

  1. How does selectKBest work?

The team decided to exclude it from our analysis altogether.

  1. How will the feature importance play into model building?

At that point, we were only inspecting the features for EDA purposes.

  1. Will you remove unimportant variables?

We drop the ID features as it’s irrelevant.