Open mendel5 opened 3 years ago
That is a very good suggestion! Would you like to do a PR?
It is mostly used in the MultiprocessingDistributor
and in the calculate_relevance_table
(and probably a bunch of docstrings).
Would you like to do a PR?
I can try it. However it might take some weeks because I'm quite busy right now.
MultiprocessingDistributor
Do you mean this one: https://github.com/blue-yonder/tsfresh/blob/main/tsfresh/utilities/distribution.py#L401?
calculate_relevance_table
Do you mean this one: https://github.com/blue-yonder/tsfresh/blob/main/tsfresh/feature_selection/relevance.py#L31?
and probably a bunch of docstrings
A grep search over the full repo returns this:
$ grep -rni "n_jobs"
docs/text/tsfresh_on_a_cluster.rst:27:`n_jobs`. This field defaults to
docs/text/tsfresh_on_a_cluster.rst:46:`n_jobs` and `chunksize`. Both behave analogue to the parameters
docs/text/tsfresh_on_a_cluster.rst:50:setting the parameter `n_jobs` to 0.
docs/text/tsfresh_on_a_cluster.rst:133: n_jobs=4)
tsfresh/feature_selection/relevance.py:37: n_jobs=defaults.N_PROCESSES,
tsfresh/feature_selection/relevance.py:131: :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/feature_selection/relevance.py:132: :type n_jobs: int
tsfresh/feature_selection/relevance.py:195: if n_jobs == 0:
tsfresh/feature_selection/relevance.py:199: processes=n_jobs,
tsfresh/feature_selection/relevance.py:230: if n_jobs != 0:
tsfresh/feature_selection/relevance.py:297: if n_jobs != 0:
tsfresh/feature_selection/selection.py:25: n_jobs=defaults.N_PROCESSES,
tsfresh/feature_selection/selection.py:110: :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/feature_selection/selection.py:111: :type n_jobs: int
tsfresh/feature_selection/selection.py:170: n_jobs=n_jobs,
tsfresh/transformers/feature_selector.py:68: n_jobs=defaults.N_PROCESSES,
tsfresh/transformers/feature_selector.py:101: :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/transformers/feature_selector.py:102: :type n_jobs: int
tsfresh/transformers/feature_selector.py:144: self.n_jobs = n_jobs
tsfresh/transformers/feature_selector.py:180: n_jobs=self.n_jobs,
tsfresh/transformers/feature_augmenter.py:67: n_jobs=tsfresh.defaults.N_PROCESSES, show_warnings=tsfresh.defaults.SHOW_WARNINGS,
tsfresh/transformers/feature_augmenter.py:96: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/transformers/feature_augmenter.py:97: :type n_jobs: int
tsfresh/transformers/feature_augmenter.py:136: self.n_jobs = n_jobs
tsfresh/transformers/feature_augmenter.py:205: n_jobs=self.n_jobs, show_warnings=self.show_warnings,
tsfresh/transformers/relevant_feature_augmenter.py:96: n_jobs=defaults.N_PROCESSES,
tsfresh/transformers/relevant_feature_augmenter.py:150: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/transformers/relevant_feature_augmenter.py:151: :type n_jobs: int
tsfresh/transformers/relevant_feature_augmenter.py:223: self.n_jobs = n_jobs
tsfresh/transformers/relevant_feature_augmenter.py:325: n_jobs=self.feature_extractor.n_jobs,
tsfresh/transformers/relevant_feature_augmenter.py:395: n_jobs=self.n_jobs,
tsfresh/transformers/relevant_feature_augmenter.py:410: n_jobs=self.n_jobs,
tsfresh/convenience/relevant_extraction.py:27: n_jobs=defaults.N_PROCESSES,
tsfresh/convenience/relevant_extraction.py:89: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/convenience/relevant_extraction.py:90: :type n_jobs: int
tsfresh/convenience/relevant_extraction.py:168: n_jobs=n_jobs,
tsfresh/convenience/relevant_extraction.py:180: n_jobs=n_jobs,
tsfresh/scripts/measure_execution_time.py:46: n_jobs = luigi.IntParameter()
tsfresh/scripts/measure_execution_time.py:59: extract_features(df, column_id="id", column_sort="time", n_jobs=self.n_jobs,
tsfresh/scripts/measure_execution_time.py:70: "n_jobs": self.n_jobs,
tsfresh/scripts/measure_execution_time.py:84: n_jobs = luigi.IntParameter()
tsfresh/scripts/measure_execution_time.py:96: extract_features(df, column_id="id", column_sort="time", n_jobs=self.n_jobs,
tsfresh/scripts/measure_execution_time.py:103: "n_jobs": self.n_jobs,
tsfresh/scripts/measure_execution_time.py:121: n_jobs=job,
tsfresh/scripts/measure_execution_time.py:125: n_jobs=job,
tsfresh/scripts/measure_execution_time.py:133: n_jobs=job,
tsfresh/scripts/measure_execution_time.py:142: n_jobs=job,
tsfresh/feature_extraction/extraction.py:30: n_jobs=defaults.N_PROCESSES, show_warnings=defaults.SHOW_WARNINGS,
tsfresh/feature_extraction/extraction.py:91: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py:92: :type n_jobs: int
tsfresh/feature_extraction/extraction.py:155: n_jobs=n_jobs, chunk_size=chunksize,
tsfresh/feature_extraction/extraction.py:177: n_jobs, chunk_size, disable_progressbar, show_warnings, distributor,
tsfresh/feature_extraction/extraction.py:214: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py:215: :type n_jobs: int
tsfresh/feature_extraction/extraction.py:235: if n_jobs == 0:
tsfresh/feature_extraction/extraction.py:239: distributor = MultiprocessingDistributor(n_workers=n_jobs,
tsfresh/utilities/dataframe_functions.py:315: n_jobs=defaults.N_PROCESSES, show_warnings=defaults.SHOW_WARNINGS,
tsfresh/utilities/dataframe_functions.py:374: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/utilities/dataframe_functions.py:375: :type n_jobs: int
tsfresh/utilities/dataframe_functions.py:416: n_jobs=n_jobs,
tsfresh/utilities/dataframe_functions.py:478: if n_jobs == 0:
tsfresh/utilities/dataframe_functions.py:482: distributor = MultiprocessingDistributor(n_workers=n_jobs,
notebooks/advanced/compare-runtimes-of-feature-calculators.ipynb:173: " n_jobs=0, \n",
tests/benchmark.py:28: benchmark(extract_features, df, column_id="id", column_sort="time", n_jobs=0,
tests/benchmark.py:35: benchmark(extract_features, df, column_id="id", column_sort="time", n_jobs=0,
tests/benchmark.py:43: benchmark(extract_relevant_features, df, y, column_id="id", column_sort="time", n_jobs=0,
tests/units/feature_selection/test_relevance.py:84: relevance_table = calculate_relevance_table(X, y_binary, n_jobs=0)
tests/units/feature_selection/test_relevance.py:103: relevance_table = calculate_relevance_table(X, y_real, n_jobs=0)
tests/units/feature_selection/test_relevance.py:138: X, y_real, n_jobs=0, ml_task="regression", show_warnings=True
tests/units/transformers/test_feature_augmenter.py:24: n_jobs=0,
tests/units/transformers/test_feature_augmenter.py:60: n_jobs=0,
tests/units/transformers/test_feature_augmenter.py:87: n_jobs=0,
tests/units/feature_extraction/test_extraction.py:22: self.n_jobs = 1
tests/units/feature_extraction/test_extraction.py:30: n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:44: n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:54: column_value="val", n_jobs=self.n_jobs,
tests/units/feature_extraction/test_extraction.py:121: n_jobs=self.n_jobs).sort_index()
tests/units/feature_extraction/test_extraction.py:126: n_jobs=self.n_jobs).sort_index()
tests/units/feature_extraction/test_extraction.py:140: X = extract_features(df, column_id="id", column_value="val", n_jobs=self.n_jobs,
tests/units/feature_extraction/test_extraction.py:152: extract_features(df, column_id="id", column_value="val", n_jobs=self.n_jobs,
tests/units/feature_extraction/test_extraction.py:164: n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:173: n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:177: n_jobs=0)
tests/units/feature_extraction/test_extraction.py:188: n_jobs=self.n_jobs)
tests/units/feature_extraction/test_extraction.py:210: self.n_jobs = 2
tests/units/feature_extraction/test_extraction.py:226: n_jobs=self.n_jobs)
tests/units/feature_extraction/test_settings.py:59: n_jobs=0)
tests/units/feature_extraction/test_settings.py:64: n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:23: rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:29: rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:34: rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:40: rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:45: rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:50: rolling_direction=0, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:55: rolling_direction=0, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:62: rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:68: rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:75: rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:114: column_kind=None, rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:122: max_timeshift=4, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:130: max_timeshift=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:154: max_timeshift=2, min_timeshift=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:208: column_kind=None, rolling_direction=-1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:216: max_timeshift=None, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:224: max_timeshift=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:247: max_timeshift=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:272: max_timeshift=4, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:298: min_timeshift=2, max_timeshift=3, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:348: column_kind=None, rolling_direction=2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:368: column_kind=None, rolling_direction=-2, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:404: column_kind="kind", rolling_direction=-1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:427: rolling_direction=-1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:477: rolling_direction=-1, max_timeshift=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:571: column_kind=None, rolling_direction=1, n_jobs=0)
tests/units/utilities/test_dataframe_functions.py:623: column_kind=None, rolling_direction=1, n_jobs=0)
I can try it. However it might take some weeks because I'm quite busy right now.
That would be awesome! If this is not fast enough for you, I can also try to have a look - but more contributors is always better :-)
Do you mean this one:
Yes and yes. Sorry, I was on the smartphone - thanks for providing the links. These two code parts are basically the only two where the n_jobs
is actually used (the rest just passes it).
A grep search over the full repo returns this:
Here are the docstrings that one would need to fix (the rest is not relevant, as only variables are passed).
tsfresh/convenience/relevant_extraction.py: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_extraction/extraction.py: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/feature_selection/relevance.py: :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/feature_selection/selection.py: :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/transformers/feature_augmenter.py: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/transformers/feature_selector.py: :param n_jobs: Number of processes to use during the p-value calculation
tsfresh/transformers/relevant_feature_augmenter.py: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
tsfresh/utilities/dataframe_functions.py: :param n_jobs: The number of processes to use for parallelization. If zero, no parallelization is used.
You could have a look into https://github.com/blue-yonder/tsfresh/pull/852/files to get some starter :-)
When working with sklearn (scikit-learn) I am used to setting the parameter
n_jobs=-2
. As explained at https://scikit-learn.org/stable/glossary.html#term-n_jobs this means:When I set the parameter
n_jobs=-2
in theextract_features()
function I get an error:ValueError: Number of processes must be at least 1
.If tsfresh would be able to accept the parameter
n_jobs=-2
it would be possible to write code for different kinds of CPUs and tell tsfresh "use all CPU cores except for one core". Therefore the code adapts to the CPU it's running on which might be an older Intel 4-core CPU or a newer Ryzen 8, 12 or 16-core CPU.