Open neezi opened 5 years ago
Thanks for opening this issue. What Python version are you running on what system? I'll look into it in the next days.
Python 3.6 on Ubuntu 18.04.3 LTS. I placed an additional line took the absolute value of each vector and that seems to have removed the error.
What worked for me was to inactivate the following feature:
CO_Embed2_Dist_tau_d_expfit_meandiff
I would expect depending on the dataset, some specific feature might be causing the segmentation fault.
@GerardBCN Do you have an example of a specific time series that causes a segmentation fault when running this CO_Embed2_Dist_tau_d_expfit_meandiff
? If so, we can build in better handling of these cases. We had no issues with the >100k time series we tested this on, right, @chlubba ? So it would be interesting to see what sort of time-series structures cause the major issue.
Yes, on our time series we did not get segmentation faults. @GerardBCN, as @benfulcher said it would be very helpful for us if you had an example of a time series where the error occurs. Thanks in advance!
@chlubba @benfulcher There seems to be some memory leakage, which may explain the segmentation fault. Here's a simple example:
for _ in range(1000000):
catch22_all(np.random.randn(1000))
Run the above and see how your memory steadily increases. I run this in IPython and even after I hit Ctrl+C
to stop the loop, the accumulated memory was still there. It wasn't until I exited IPython that my memory went down back to normal.
The above is not a far-fetched example: It is common to break a long timeseries into small windows and do feature extraction in each window -- in other words, a rolling window feature extraction. This is what I am doing and that's how I encountered this issue.
Hi @chanshing, thanks for pointing to the issue of memory consumption. Which wrapper are you using (Python, Matlab, R)?
This issue here opened by @neezi is about a segmentation fault. So slightly different topic. But both are related to memory management, I agree.
Hi @chlubba I'm using the Python wrapper. The full example above would be:
import numpy as np
from catch22 import catch22_all
for _ in range(1000000):
catch22_all(np.random.randn(1000))
When running the above code long enough, I get the mentioned segmentation fault. That's why I suspect it has to do with the memory leakage.
Hi @chlubba. I confirm there's a memory leak using the R wrapper. It can be checked with the following code:
library(catch22)
for(ii in 1:1E6) {
catch22::catch22_all(rnorm(1000))
}
I've been getting segmentation faults whenever I try to run the python version. For smaller data sets it works fine. Here I'm using a data set of size 100x10000. As soon as I go to 200x10000 it gets hung up. The iteration that it breaks on changes from attempt to attempt. Sometimes it breaks on iteration 150. Sometimes on 174. Seems like a memory problem?