current12 / Stat-222-Project

2 stars 0 forks source link

Consider Dropping Items Missing Any Covariates of Interest #55

Closed ijyliu closed 3 months ago

ijyliu commented 4 months ago

Create drop in the code to create all data

Allows for comparability across all models - same train-test split, same number of observations

Output dataset with the observation and why it was dropped

ijyliu commented 4 months ago

Potentially ask Libor

ijyliu commented 4 months ago

we are not going to be missing too many items, so I suggest just dropping all observations missing any of the covariates

https://github.com/current12/Stat-222-Project/issues/20#issuecomment-2007769087

ijyliu commented 3 months ago

Proposed solution: drop items missing any variable in test set (so accuracy/performance is comparable), but still allow items missing variables in training set (make full use of the data we have).

ijyliu commented 3 months ago

Going to eventually go ahead and drop items missing any covariate of interest

Going to make a new version of all data that contains everything in the original + NLP features but drops things missing anything

ijyliu commented 3 months ago

Created https://github.com/current12/Stat-222-Project/tree/main/Data/All_Data/All_Data_with_NLP_Features

Now need to adjust all code to use it.

ijyliu commented 3 months ago

completed adjustment on my files and left reminders on other issues