blue-yonder / tsfresh

Automatic extraction of relevant features from time series:
http://tsfresh.readthedocs.io
MIT License
8.46k stars 1.21k forks source link

View and Set N Jobs #1029

Closed DhruvSrikanth closed 1 year ago

DhruvSrikanth commented 1 year ago

Reason for PR:

When profiling specific functions, it is often helpful to have more fine-grained control over the N_PROCESSES constant used in multiprocessing.

PR Description:

Getter and setter functions for the N_PROCESSES constant found with the other default constants.

DhruvSrikanth commented 1 year ago

I was not able to pass the n_jobs parameter to some of the feature extractors such as mean, median etc. I wanted to profile specific feature extractors, hence, I thought more control over the default number of processes would be useful to have. Happy to provide a more concrete example as well.

nils-braun commented 1 year ago

Yes, I might need some more concrete example. Because each specific feature extractor like mean etc. does not do multiprocessing in any case - multiprocessing is just used to process multiple timeseries in parallel. If there is only a single time series, changing the number of jobs will not help :)

DhruvSrikanth commented 1 year ago

I see. So each feature extractor uses a single process to perform the computation.

What about the case where multiple features need to be extracted on the same timeseries. Would a single process handle all of the feature extractors or would n_jobs allow you to distribute the feature extraction across multiple processes (even though it is only on a single timeseries)?

Irrespective of that, I believe it would be helpful to have a function to globally set the number of jobs to be used in any computation. I don't see it being more or less helpful than having n_jobs be a parameter (as is already present in the package) however I think it would be helpful to have that functionality for convenience and user preference. That being said I'm open to being convinced otherwise. :)

P.S - Apologies for the late response.

nils-braun commented 1 year ago

The parallelization is only on the timeseries. So a single timeseries will never do anything in parallel - so multiple features for a single time series are calculated on the same process.

Sure, we can still have the changes in - I just wanted to make sure you actually get what you expected :)

After my comment is in, we can merge!

DhruvSrikanth commented 1 year ago

I've incorporated the following changes:

  1. simplified imports
  2. reset n_jobs after running test

Are there any others I should incorporate? @nils-braun

codecov-commenter commented 1 year ago

Codecov Report

Patch and project coverage have no change.

Comparison is base (ac413b6) 93.43% compared to head (8cf3bf8) 93.43%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1029 +/- ## ======================================= Coverage 93.43% 93.43% ======================================= Files 20 20 Lines 1888 1888 Branches 372 372 ======================================= Hits 1764 1764 Misses 85 85 Partials 39 39 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

DhruvSrikanth commented 1 year ago

Thanks!

How can I check if the PR has been merged into main?

nils-braun commented 1 year ago

I have just done so. You see it mentioned on this PR page.

DhruvSrikanth commented 1 year ago

Awesome! Thanks!!