carmenalab / emgdecomp

Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.
MIT License
23 stars 9 forks source link

Server restart error #3

Open shihan-ma opened 2 years ago

shihan-ma commented 2 years ago

Hi, Thanks for your repository!

I used the scripts in the readme and tried to decompose a 10-s simulated signal (64 channels * 20480 samples). It works at most times, producing around 10 MUs against 18 real ones. However, sometimes our server restarted after running the scripts three or four times. We found that the program stuck at https://github.com/carmenalab/emgdecomp/blob/master/emgdecomp/decomposition.py#L405. After converting 'whitening_matrix' and 'normalized_data' to np.float32, the error decreases but still happens sometimes. Could you please give me some advice on the reason that induced the restart of the server? The memory seems okay and we did not use CUDA at this point.

Another question is that could you please provide some interface like '_assert_decomp_successful' at https://github.com/carmenalab/emgdecomp/blob/master/emgdecomp/tests/test_decomposition.py#L140 for validation?

Thanks!

pbotros commented 2 years ago

@shihan-ma thanks for the interest and question!

Re: the server restarting: can you provide any more information on the restart? Is there an exception thrown, is it killed by the OS with a particular exit code, etc.? Finally, how much RAM do you have?

Can you run the tests in TestDecomposition, in particular the first one test_simulated_data_contrast_functions (https://github.com/carmenalab/emgdecomp/blob/master/emgdecomp/tests/test_decomposition.py#L78)? Something like the following can help you run the test:

pip3 show emgdecomp
# Look at the "location" line to see where it's installed, and then go there:
cd <location>/emgdecomp
pytest -s tests/test_decomposition.py -k 'test_simulated_data_contrast_functions'

Memory would typically be the culprit... but a 10-s signal / 20k samples is fairly small even after whitening, and any modern computer should be able to handle that, so I'm curious if there's something else happening here.

Re: your second point: I'll open up a second issue to track that. That's a good idea!

shihan-ma commented 2 years ago

Hi Paul,

The RAM of our server is 64GB and no other processes run at the same time. No exception was thrown or any warnings. The server just restarted suddenly :(.

I run the tests in TestDecomposition and it showed that 11 were collected, 10 passed and one skipped. I'm not that familiar with pytest and was wondering whether the conditions were the same when using pytest and python. It also showed that to reproduce an error, unpickle & set the numpy random state to xxx but I guess that's OK.

We tried numpy-1.18.1 and converted 'whitening_matrix' and 'normalized_data' to np.float32. In this condition, everything works well! However, when we used conda env with numpy-1.19, the server crashes. I wondered whether this error relates to the library versions. Could you please provide the versions of the required libraries, e.g. 'matplotlib', 'scipy', 'numpy', 'numba', etc?

Thank you in advance!

Best regards, Shihan

pbotros commented 2 years ago

Thanks for running the tests - that's the expected output.

Interesting that the versions matter. It's worked on a few different versions, but our current machine has 1.19.5 and seems to work fine. That machine is Linux, 32GB RAM, relatively new Intel CPU.

Taking a step back, I'm curious on more information on your "server crashing". Do you mean the machine itself reboots / kernel panic? Or, are you running this code as part of a long-running server process and the process crashes? Are you running on Linux, and, if so, is there anything in the kernel log (sudo dmesg) or system messages (e.g. in /var/log/messages / as described in https://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer)? Finally, can you reproduce the crash when running from a python interpreter?

It's super strange for a process to simply quit without an error message, at least in Python, especially if there's a problem with package version mismatches. Another thought is that it's a problem with NumPy / dependent libraries, for instance check out https://github.com/numpy/numpy/issues/11517#issuecomment-551904310 to see if any of those test cases / solutions work for you. It'd be unfortunate to always be stuck on an old version of numpy!