When my MBP with 16 GB of RAM hits this line cv_pipeline = GridSearchCV(estimator=pipeline, param_grid=param_grid, n_jobs=-1, scoring='roc_auc') it thrashes. The n_jobs parameter causes multiple jobs to be created, with them a higher demand on RAM. My MBP has a i7 processor which is hyper-threaded. When n_jobs=-1, it spins up as many tasks as the machine has cores, but it thinks that my machine has 8 cores when it really only has 4 and 4 virtual cores. Hyper-threading uses inefficiency in the pipeline to create a virtual pipeline. This works fine for multi-tasking like browsing and editing but not for processor hungry tasks like GridSearchCV that most likely do not use the real pipeline inefficiently. So 8 tasks was swamping my RAM and there is probably no benefit on my system to spinning up more than 4.
To fix this, I merely set n_jobs=4.
When my MBP with 16 GB of RAM hits this line
cv_pipeline = GridSearchCV(estimator=pipeline, param_grid=param_grid, n_jobs=-1, scoring='roc_auc')
it thrashes. The n_jobs parameter causes multiple jobs to be created, with them a higher demand on RAM. My MBP has a i7 processor which is hyper-threaded. When n_jobs=-1, it spins up as many tasks as the machine has cores, but it thinks that my machine has 8 cores when it really only has 4 and 4 virtual cores. Hyper-threading uses inefficiency in the pipeline to create a virtual pipeline. This works fine for multi-tasking like browsing and editing but not for processor hungry tasks like GridSearchCV that most likely do not use the real pipeline inefficiently. So 8 tasks was swamping my RAM and there is probably no benefit on my system to spinning up more than 4. To fix this, I merely set n_jobs=4.