DynamicsAndNeuralSystems / catch22

catch22: CAnonical Time-series CHaracteristics
https://time-series-features.gitbook.io/catch22
GNU General Public License v3.0
366 stars 69 forks source link

Segmentation fault error when running python version #4

Open neezi opened 5 years ago

neezi commented 5 years ago

I've been getting segmentation faults whenever I try to run the python version. For smaller data sets it works fine. Here I'm using a data set of size 100x10000. As soon as I go to 200x10000 it gets hung up. The iteration that it breaks on changes from attempt to attempt. Sometimes it breaks on iteration 150. Sometimes on 174. Seems like a memory problem?

chlubba commented 5 years ago

Thanks for opening this issue. What Python version are you running on what system? I'll look into it in the next days.

neezi commented 5 years ago

Python 3.6 on Ubuntu 18.04.3 LTS. I placed an additional line took the absolute value of each vector and that seems to have removed the error.

GerardBCN commented 4 years ago

What worked for me was to inactivate the following feature:

CO_Embed2_Dist_tau_d_expfit_meandiff

I would expect depending on the dataset, some specific feature might be causing the segmentation fault.

benfulcher commented 4 years ago

@GerardBCN Do you have an example of a specific time series that causes a segmentation fault when running this CO_Embed2_Dist_tau_d_expfit_meandiff? If so, we can build in better handling of these cases. We had no issues with the >100k time series we tested this on, right, @chlubba ? So it would be interesting to see what sort of time-series structures cause the major issue.

chlubba commented 4 years ago

Yes, on our time series we did not get segmentation faults. @GerardBCN, as @benfulcher said it would be very helpful for us if you had an example of a time series where the error occurs. Thanks in advance!

chanshing commented 3 years ago

@chlubba @benfulcher There seems to be some memory leakage, which may explain the segmentation fault. Here's a simple example:

for _ in range(1000000):
    catch22_all(np.random.randn(1000))

Run the above and see how your memory steadily increases. I run this in IPython and even after I hit Ctrl+C to stop the loop, the accumulated memory was still there. It wasn't until I exited IPython that my memory went down back to normal.

chanshing commented 3 years ago

The above is not a far-fetched example: It is common to break a long timeseries into small windows and do feature extraction in each window -- in other words, a rolling window feature extraction. This is what I am doing and that's how I encountered this issue.

chlubba commented 3 years ago

Hi @chanshing, thanks for pointing to the issue of memory consumption. Which wrapper are you using (Python, Matlab, R)?

This issue here opened by @neezi is about a segmentation fault. So slightly different topic. But both are related to memory management, I agree.

chanshing commented 3 years ago

Hi @chlubba I'm using the Python wrapper. The full example above would be:

import numpy as np
from catch22 import catch22_all

for _ in range(1000000):
    catch22_all(np.random.randn(1000))

When running the above code long enough, I get the mentioned segmentation fault. That's why I suspect it has to do with the memory leakage.

quesadagranja commented 3 years ago

Hi @chlubba. I confirm there's a memory leak using the R wrapper. It can be checked with the following code:

library(catch22)

for(ii in 1:1E6) {
  catch22::catch22_all(rnorm(1000))
}