Open kelseymarkey opened 4 years ago
Assigning to @charlesoblack to handle second, third, and fourth points above (and update any modeling notebooks as necessary).
Assigning to @arendakessian so that he can update any modeling notebooks as necessary.
Tested Sid's changes to vectorizer-count.ipynb and concat-features.ipynb, all behavior as expected.
Also updated my modeling notebook (Naive_Bayes.ipynb) to reflect these new file names (commit 1526bcd1130c510ca7e5d8624a890b509869edd1).
Only remaining task is for Aren and Sid to update their respective modeling notebooks.
I want to make sure that this works before closing it but it's next in line.
We are now allowing both upsampling and downsampling in our workflow, so we need to update some file names in our current notebooks to reflect this.
upsample.ipynb
anddownsample.ipynb
should be fixed so that they output a file calledidx_train.pckl
.vectorizer-count.ipynb
should be fixed so that it reads in thisidx_train
file (instead ofdownsampled_idx_train.pckl
as it does currently)concat-features.ipynb
should be fixed so that it reads in thisidx_train
file (instead ofdownsampled_idx_train.pckl
as it does currently).concat-features.ipynb
should be fixed so that it no longer outputs datasets in the form{dataset}_{vectorizer}_downsampled_data.pckl
since this can now be upsampled data. Sid has now changed this to `{dataset}{vectorizer}_subsampleddata.pckl`.{dataset}_{vectorizer}_downsampled_data.pckl
form should be corrected. (Sid- baseline models, Kelsey- NB, Aren- modeling notebook)