Closed AniaMatysek closed 2 years ago
I've also done the Exploratory Data Analysis. Please let me know if I should add anything, as I wasn't sure what this section should cover.
Hey @AniaMatysek - thank you for taking on EDA!
I fixed the bug that was causing the pdf to fail to generate: when writing $ p - value = ...$ the "-" symbol used was not a regular minus, but some symbol not recognized by LaTex. Resolved that, generated the pdf and pushed here.
On the methodology side, I have a couple of points that came to my mind when reading these sections - whether to apply or not, I leave the decision to you: I. EDA
ggplot2
package, to plot those histograms. You would need to pivot_longer
the data, to have feature names as columns, but then instead of making h1, ..., h23, you could plot all at once with something like:
data %>% pivot_longer(...) %>% ggplot(aes(...)) %>% geom_histogram() %>% facet_wrap(~...)
plot(AirlinesRaw)
even readable with that many variables?plot(AirlinesRaw)
kills my laptop. It is computationally heavy, so I put sample_n(...)
to have it generated on a sub-sample. And again, do we need it in the way it is presented?II. Logistic Regression
Class
should heavily interact with many other variables as well, right?
I've done the part connected with logistic regression. However, some changes may appear in the future as I need to discuss observations and conclusions with Phillipe during our class.