cognoma / machine-learning

Machine learning for Project Cognoma
Other
32 stars 47 forks source link

Thrashing in 2.TCGA-MLexample #86

Closed jruhym closed 7 years ago

jruhym commented 7 years ago

When my MBP with 16 GB of RAM hits this line cv_pipeline = GridSearchCV(estimator=pipeline, param_grid=param_grid, n_jobs=-1, scoring='roc_auc') it thrashes. The n_jobs parameter causes multiple jobs to be created, with them a higher demand on RAM. My MBP has a i7 processor which is hyper-threaded. When n_jobs=-1, it spins up as many tasks as the machine has cores, but it thinks that my machine has 8 cores when it really only has 4 and 4 virtual cores. Hyper-threading uses inefficiency in the pipeline to create a virtual pipeline. This works fine for multi-tasking like browsing and editing but not for processor hungry tasks like GridSearchCV that most likely do not use the real pipeline inefficiently. So 8 tasks was swamping my RAM and there is probably no benefit on my system to spinning up more than 4. To fix this, I merely set n_jobs=4.

jruhym commented 7 years ago

thrashing

jruhym commented 7 years ago

Relates to https://github.com/cognoma/machine-learning/issues/70