Open mrahmadt opened 4 months ago
My experiments on Windows show that running cross validation or random sampling in Test & Score does not run the threads in parallel utilizing all CPUs.
Random forest and other trivially parallelizable methods (e.g. XGBoost) could be parallelized with the n_jobs parameter even for a single training/prediction call (such as testing on test data) but are not.
The combination of both could be problematic by spawning more threads than there are processors. The only exception that I found is Logistic Regression, which always utilizes all CPUs, but probably on some lower level.
This needs further discussion.
Parallelization in Test & Score was intentionally removed in https://github.com/biolab/orange3/pull/2300/commits/1f8d008b84e9e7c3bd54e79a662779e914eb6443.
The easiest way of re-introducing parallelization would be on the level of individual models (e.g. random forest), where scikit-learn takes care of it (n_jobs=-1
).
Hello Everyone
Not sure if I'm doing something wrong or this is the default behavior. I have a server with following specs
32 x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz (2 Sockets) Memory 125GB SSD Disks
I installed Proxmox.com (proxmox.com) and created Windows 10 machine (32 Processors and 64GB Memory)
Everything is working fine in Orange, but "Test & Score" takes hours to process 600M CSV file with "Random Forest", and the strange thing, it's not utalizing the full CPU/Memory of the machine!
Anything I can do to make Orange use the full CPU/Memory resources?
Orange 3.37.0 (Orange3-3.37.0-Miniconda-x86_64.exe)