Closed samikrc closed 4 years ago
Merged pull request with this feature. Closing.
Although the level of parallelism can now be controlled by the user, looks like the number of threads launched in, say, a CV experiment is same as the number of variation for a parameter. Looks like, not all the jobs are getting generated, and hence all the parallelism is not getting used.
For e.g., with a parallelism of 6, and the following config for SVM CV:
"svm":
{
"plattScalingEnabled": true,
"regparam": [0, 0.001, 0.005, 0.01],
"maxiter": [1000],
"standardization": [true]
},
I am seeing only 4 threads getting launched:
20/08/17 11:12:57 INFO tuning.CrossValidatorCustom: Starting cross-validation runs.
20/08/17 11:12:58 INFO tuning.CrossValidatorCustom: Training CV set 1 of 5 with parameter map: maxIter=>1000/regParam=>0.01/standardization=>true
20/08/17 11:12:58 INFO tuning.CrossValidatorCustom: Training CV set 1 of 5 with parameter map: maxIter=>1000/regParam=>0.001/standardization=>true
20/08/17 11:12:58 INFO tuning.CrossValidatorCustom: Training CV set 1 of 5 with parameter map: maxIter=>1000/regParam=>0.0/standardization=>true
20/08/17 11:12:58 INFO tuning.CrossValidatorCustom: Training CV set 1 of 5 with parameter map: maxIter=>1000/regParam=>0.005/standardization=>true
Added issue #22 with the above problem. Closing this issue (since this was concerning the implementation of user defined parallelism).
Currently the default parallelism is set to 3, which can't be overridden, and is used in a bunch of places, thereby slowing down the training. Introduce experiment.parallelism, where a number can be specified, which will be used in all places.