BlueBrain / BluePyOpt

Blue Brain Python Optimisation Library
https://bluepyopt.readthedocs.io/en/latest/
Other
198 stars 96 forks source link

A strange crash with no specific cause #478

Open SteMasoli opened 10 months ago

SteMasoli commented 10 months ago

Hi.

After 3 years I have restarted using BluePyOpt and I'm observing a strange behavior. (Distro Mint 21.2 - Latest Bluepyopt - Neuron 8.2.2)

Starting a local server with: ipcluster start -n 16 & sleep 40

python3 script.py

The startup does not show anomalies.

2023-10-25 15:31:29.530 [IPController] Notifying hub of 16 new hearts
2023-10-25 15:31:29.532 [IPController] Registering 16 new hearts
2023-10-25 15:31:29.532 [IPController] registration::finished registering engine 3:e14ed5e7-309f9a47442b57d503090f87 in 4880ms
2023-10-25 15:31:29.532 [IPController] engine::Engine Connected: 3
cut
2023-10-25 15:32:03.138 [IPController] task::task 'bbfe1033-f97cc6d7cc612a542f69d58f_34140_1' arrived on 10
2023-10-25 15:32:03.138 [IPController] task::task 'bbfe1033-f97cc6d7cc612a542f69d58f_34140_2' arrived on 9
cut
2023-10-25 15:32:03.140 [IPEngine] Handling apply_request: bbfe1033-f97cc6d7cc612a542f69d58f_34140_5
2023-10-25 15:32:03.142 [IPEngine] Handling apply_request: bbfe1033-f97cc6d7cc612a542f69d58f_34140_8

Then it starts to collapse.

2023-10-25 15:32:04.949 [IPController] client::client b'\x00k\x8bEl' requested 'unregistration_request'
2023-10-25 15:32:04.949 [IPController] registration::unregister_engine(1)
cut

2023-10-25 15:32:05.159 [IPClusterStart] WARNING | engine set stopped 1698240683: {'engines': {'1': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33688, 'identifier': '1'}, '4': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33701, 'identifier': '4'}, '5': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33708, 'identifier': '5'}, '10': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33779, 'identifier': '10'}, '14': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33840, 'identifier': '14'}, '7': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33731, 'identifier': '7'}, '13': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33824, 'identifier': '13'}, '3': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33694, 'identifier': '3'}, '12': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33811, 'identifier': '12'}, '6': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33715, 'identifier': '6'}, '0': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33685, 'identifier': '0'}, '8': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33747, 'identifier': '8'}, '2': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33691, 'identifier': '2'}, '11': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33792, 'identifier': '11'}, '15': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33859, 'identifier': '15'}, '9': {'exit_code': <Negsignal.SIGABRT: -6>, 'pid': 33763, 'identifier': '9'}}, 'exit_code': <Negsignal.SIGABRT: -6>}`

At the same time CPU cores are still at 100%.

If I already have a NEURON compiled from source, in conjunction with the one installed during pip install of BluePyOpt, can cause "segmentation fault" if they are not the same version.