I am trying to get the features for a bunch of time series, but I keep running into the error below. For example, when I run over 12,000 time series, the first 9,000 work fine, however the loop breaks and kills the kernel when it gets around 9,000. I tried just doing the last 3,000, however, the error still pops up. It works for every time series in the last 3,000 if I groupby and apply the method I made one at a time. The issue appears when I put it in a for loop. It will run a couple of the time series and then this error appears. I have also tried it on different cluster set ups with varying sizes and workers and the error still pops up. Any help would be greatly appreciated. Thanks!
rounds = int((issues_df['time_series_idx'].nunique()))
for i in range(0,rounds):
reduced_df = issues_df[(issues_df['time_series_idx'].isin([issues_df['time_series_idx'].unique()[i]]))]
features_df = reduced_df.groupby(['run_id']).apply(catch_24) #works by itself when I do one time series at a time
features.append(features_df)
Fatal error: The Python kernel is unresponsive.
---------------------------------------------------------------------------
The Python process exited with exit code 139 (SIGSEGV: Segmentation fault).
The last 10 KB of the process's stderr and stdout can be found below. See driver logs for full logs.
---------------------------------------------------------------------------
Last messages on stderr:
y", line 1016 in _bootstrap_inner
File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap
Thread 0x00007fb746ffe640 (most recent call first):
File "/usr/lib/python3.10/multiprocessing/pool.py", line 114 in worker
File "/usr/lib/python3.10/threading.py", line 953 in run
File "/usr/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
File "/usr/lib/python3.10/threading.py", line 973 in _bootstrap
I am trying to get the features for a bunch of time series, but I keep running into the error below. For example, when I run over 12,000 time series, the first 9,000 work fine, however the loop breaks and kills the kernel when it gets around 9,000. I tried just doing the last 3,000, however, the error still pops up. It works for every time series in the last 3,000 if I groupby and apply the method I made one at a time. The issue appears when I put it in a for loop. It will run a couple of the time series and then this error appears. I have also tried it on different cluster set ups with varying sizes and workers and the error still pops up. Any help would be greatly appreciated. Thanks!