blue-yonder / tsfresh

Automatic extraction of relevant features from time series:
http://tsfresh.readthedocs.io
MIT License
8.35k stars 1.21k forks source link

Feature Extraction Only Works in Jupyter Notebook #490

Open helloSeen opened 5 years ago

helloSeen commented 5 years ago

Hi I am using Windows 10 and the latest version of tsfresh (installed using pip). When I try running the example code from the robot execution failures example, I can not calculate the features by running the python script by itself. It reaches the feature extraction and stalls at " Feature Extraction: 0%| | 0/20 [00:00<?, ?it/s]" in the Python Shell. However, I copied and pasted the exact same script into a jupyter notebook and it works fine. Here is the code that I used:

import matplotlib.pylab as plt
from tsfresh.examples.robot_execution_failures import download_robot_execution_failures, load_robot_execution_failures
from tsfresh import extract_features, extract_relevant_features, select_features
from tsfresh.utilities.dataframe_functions import impute
from tsfresh.feature_extraction import ComprehensiveFCParameters

download_robot_execution_failures()
df, y = load_robot_execution_failures()
df.head()

df[df.id == 3][['time', 'F_x', 'F_y', 'F_z', 'T_x', 'T_y', 'T_z']].plot(x='time', title='Success example (id 3)', figsize=(12, 6));
df[df.id == 20][['time', 'F_x', 'F_y', 'F_z', 'T_x', 'T_y', 'T_z']].plot(x='time', title='Failure example (id 20)', figsize=(12, 6));

X = extract_features(df, column_id='id', column_sort='time')

X.head()

I have tried turning off the progress bar but that does not fix the issue. I also turned on "show warnings" and no warnings appear.

MaxBenChrist commented 5 years ago

There seem to be some issues with multiprocessing library under windows, see https://github.com/blue-yonder/tsfresh/issues/185

helloSeen commented 5 years ago

There seem to be some issues with multiprocessing library under windows, see #185

Adding

if __name__ == '__main__':

Doesn't resolve the issue

MaxBenChrist commented 5 years ago

Can you turnoff multiprocessing and see if the problem persists? you have to set the jobs to 0 for that, so extract_features(n_jobs=0, ...)

helloSeen commented 5 years ago

Setting n_jobs = 0, fixed the issue thanks. I don't know if it's worth noting but I got the following warning when I ran the program:

Warning (from warnings module): File ".\Python37\lib\site-packages\tsfresh\feature_extraction\feature_calculators.py", line 1400 return - np.sum(p * np.math.log(p) for p in probs if p != 0) DeprecationWarning: Calling np.sum(generator) is deprecated, and in the future will give a different result. Use np.sum(np.fromiter(generator)) or the python sum builtin instead.

MaxBenChrist commented 5 years ago

So it is an issue with multiprocessing in windows.

MaxBenChrist commented 5 years ago

@helloSeen the deprication warning will be gone in the near future. Already fixed that on https://github.com/blue-yonder/tsfresh/pull/496

nils-braun commented 4 years ago

@helloSeen Is this still an issue? Is multiprocessing still not working properly? If yes, please reopen this, thanks!

e5k commented 4 years ago

Hi all, just wanted to mention one thing in case it is of interest. Running the example notebooks in VS Code on Mac 10.15 with tsfresh 0.16. extract_feature in 01 Feature Extraction and Selection.ipynb works fine, but it does not in 04 Multiclass Selection Example.ipynb - i.e. it does not start. Adding the n_jobs = 0 seems to fix the problem.

nils-braun commented 4 years ago

Thanks for the update @e5k ! @MaxBenChrist, as you probably have access to a Mac (which I do not have): did you face the same problems? How did you solve them?

e5k commented 4 years ago

@nils-braun I only wish I could help solve it :)

heib6xinyu commented 3 months ago

@helloSeen Is this still an issue? Is multiprocessing still not working properly? If yes, please reopen this, thanks!

Yes multiprocessing on window is still an issue. I ran into this issue multiple times. At the beginning reinstalling the virtual environment works. But after some time (for example, 1 day later), the extract relevant feature will get stuck at 0 again. Then I used n_jobs = 0, it works now.

nils-braun commented 3 months ago

Hi @heib6xinyu !

But after some time (for example, 1 day later)

What has changes between this? Do you mean you had the jupyter notebook running for this time? Or did you install other packages in between? Or did you change the data?

For me, it sounds highly unlikely that the exact same setup works on day one but does not work on day two :)

tsfresh is just using "normal" python multiprocessing, do you see this issue also with other packages that use multiprocessing (or maybe your own code)?

heib6xinyu commented 3 months ago

Hi @heib6xinyu !

But after some time (for example, 1 day later)

What has changes between this? Do you mean you had the jupyter notebook running for this time? Or did you install other packages in between? Or did you change the data?

For me, it sounds highly unlikely that the exact same setup works on day one but does not work on day two :)

tsfresh is just using "normal" python multiprocessing, do you see this issue also with other packages that use multiprocessing (or maybe your own code)?

Hi Nils, I first started on my windows computer with python 3.11 tsfresh 0.20, I run extract relevant features with njobs=4, it is not moving at all. Then I reinstalled Python 3.10, run the same files again, it works. But if I do something else on the file (building models and such), and come back to extract features again, it will get stuck at 0. Until I delete the virtual environment and reinstall. I am guessing it had something to do with I wasn't using if name == "main": since I am using visual studio code's Jupyter notebook extension to interactively run the code. Eventually I got tired of reinstalling my env, I search online and see some suggestions on using njobs =0(or 1? My work laptop is not with me) it works.

nils-braun commented 3 months ago

This is indeed strange. I would think (?) that restarting the kernel also helps and you do not need to re-install the full env again? It might be a combination of vscode + jupyter + multiprocessing.

Would you be able to check if you see the same also in a normal jupyter notebook session?

If it works with njobs=0 and the runtime is still reasonable, you can of course also keep it like this. There is no difference in the result.

heib6xinyu commented 3 months ago

A actual Jupyter notebook always works. Restarting the kernal doesn't. Njobs =0 works fine for me.