clupascu commented 6 months ago

simout-122.log Hi @pramodk, @jorblancoa,

I pip installed bluepyopt and all the dependencies on a small server and when I run the optimization I get the log attached and the job fails.

Can you please suggest what is going wrong? The same job works perfectly on other systems.


pramodk commented 6 months ago

@clupascu : thanks! Could you provide the following info?

  1. Linux distro / OS version
  2. All installation commands (e.g. to install all necessary packages)
  3. stdout when you execute about installation script (e.g. bash -x > install.log 2>&1)

This will help to know which all packages are being installed and their versions. And then it will be easy to reproduce locally.

(@anilbey or other bluepyopt devs might have other questions or suggestions!)

anilbey commented 6 months ago

Hi @clupascu, thanks for reporting this issue.

The error says the following is missing.

FileNotFoundError: [Errno 2] No such file or directory: '/proc/355447/stat'

In addition to the info requested by @pramodk, could you also inform us on these 2 points?

  1. Does that server have the /proc director?
  2. Are there enough processors in that server to enable multiprocessing?
clupascu commented 6 months ago

Hi @pramodk and @anilbey,

the system says Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 6.2.0-39-generic x86_64)

I don't understand requests 2 and 3. Can you explain better? What packages and you are referring to?

The server has the /proc director and the 3 compute nodes have 2 multicore cpus.

pramodk commented 5 months ago

Sorry, I missed the message.

For 2): I meant if you installed neuron, bluepyopt etc then just something like below to know the exact commands used:

python3 -m venv vv
. vv/bin/activate
pip install neuron bluepyopt

and for 3), the output from the above commands on screen:

$ python3 -m venv vv
$ . vv/bin/activate
$ pip install neuron bluepyopt
this just tell us which exact packages are installed.

clupascu commented 5 months ago

(vv) clupascu@a1n3login:~$ pip install neuron bluepyopt ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.0/13.0 MB 3.0 MB/s eta 0:00:00 Collecting ipyparallel Downloading ipyparallel-8.7.0-py3-none-any.whl (292 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.4/292.4 KB 3.2 MB/s eta 0:00:00 Collecting efel>=2.13 Downloading efel-5.6.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 3.7 MB/s eta 0:00:00 Collecting Pebble>=4.6.0 Using cached Pebble-5.0.6-py3-none-any.whl (30 kB) Collecting pickleshare>=0.7.3 Using cached pickleshare-0.7.5-py2.py3-none-any.whl (6.9 kB) Collecting Jinja2>=2.8 Using cached Jinja2-3.1.3-py3-none-any.whl (133 kB) Collecting deap>=1.3.3 Using cached deap-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (135 kB) Collecting typing-extensions>=4.8.0 Downloading typing_extensions-4.10.0-py3-none-any.whl (33 kB) Collecting neo>=0.5.2 Using cached neo-0.13.0-py3-none-any.whl (620 kB) Collecting scipy<2.0.0,>=1.12.0 Downloading scipy-1.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.4 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.4/38.4 MB 7.0 MB/s eta 0:00:00 Collecting MarkupSafe>=2.0 Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB) Collecting tzdata>=2022.7 Using cached tzdata-2024.1-py2.py3-none-any.whl (345 kB) Collecting python-dateutil>=2.8.2 Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 KB 8.7 MB/s eta 0:00:00 Collecting pytz>=2020.1 Downloading pytz-2024.1-py2.py3-none-any.whl (505 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 505.5/505.5 KB 5.6 MB/s eta 0:00:00 Collecting ipykernel>=4.4 Downloading ipykernel-6.29.3-py3-none-any.whl (117 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.1/117.1 KB 5.5 MB/s eta 0:00:00 Collecting tqdm Using cached tqdm-4.66.2-py3-none-any.whl (78 kB) Collecting pyzmq>=18 Using cached pyzmq-25.1.2-cp310-cp310-manylinux_2_28_x86_64.whl (1.1 MB) Collecting psutil Using cached psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB) Collecting jupyter-client Downloading jupyter_client-8.6.1-py3-none-any.whl (105 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 105.9/105.9 KB 6.0 MB/s eta 0:00:00 Collecting tornado>=5.1 Using cached tornado-6.4-cp38-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (435 kB) Collecting traitlets>=4.3 Downloading traitlets-5.14.2-py3-none-any.whl (85 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.4/85.4 KB 5.8 MB/s eta 0:00:00 Collecting entrypoints Using cached entrypoints-0.4-py3-none-any.whl (5.3 kB) Collecting decorator Using cached decorator-5.1.1-py3-none-any.whl (9.1 kB) Collecting ipython>=4 Downloading ipython-8.22.2-py3-none-any.whl (811 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 812.0/812.0 KB 5.8 MB/s eta 0:00:00 Collecting jupyter-core!=5.0.*,>=4.12 Downloading jupyter_core-5.7.2-py3-none-any.whl (28 kB) Collecting debugpy>=1.6.5 Using cached debugpy-1.8.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB) Collecting matplotlib-inline>=0.1 Using cached matplotlib_inline-0.1.6-py3-none-any.whl (9.4 kB) Collecting nest-asyncio Using cached nest_asyncio-1.6.0-py3-none-any.whl (5.2 kB) Collecting comm>=0.1.1 Downloading comm-0.2.2-py3-none-any.whl (7.2 kB) Collecting prompt-toolkit<3.1.0,>=3.0.41 Using cached prompt_toolkit-3.0.43-py3-none-any.whl (386 kB) Collecting stack-data Using cached stack_data-0.6.3-py3-none-any.whl (24 kB) Collecting jedi>=0.16 Using cached jedi-0.19.1-py2.py3-none-any.whl (1.6 MB) Collecting pygments>=2.4.0 Using cached pygments-2.17.2-py3-none-any.whl (1.2 MB) Collecting exceptiongroup Using cached exceptiongroup-1.2.0-py3-none-any.whl (16 kB) Collecting pexpect>4.3 Downloading pexpect-4.9.0-py2.py3-none-any.whl (63 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.8/63.8 KB 4.2 MB/s eta 0:00:00 Collecting quantities>=0.14.1 Using cached quantities-0.15.0-py3-none-any.whl (101 kB) Collecting six>=1.5 Downloading six-1.16.0-py2.py3-none-any.whl (11 kB) Collecting parso<0.9.0,>=0.8.3 Using cached parso-0.8.3-py2.py3-none-any.whl (100 kB) Collecting platformdirs>=2.5 Using cached platformdirs-4.2.0-py3-none-any.whl (17 kB) Collecting ptyprocess>=0.5 Downloading ptyprocess-0.7.0-py2.py3-none-any.whl (13 kB) Collecting wcwidth Using cached wcwidth-0.2.13-py2.py3-none-any.whl (34 kB) Collecting executing>=1.2.0 Using cached executing-2.0.1-py2.py3-none-any.whl (24 kB) Collecting asttokens>=2.1.0 Using cached asttokens-2.4.1-py2.py3-none-any.whl (27 kB) Collecting pure-eval Using cached pure_eval-0.2.2-py3-none-any.whl (11 kB) Installing collected packages: wcwidth, pytz, pure-eval, ptyprocess, pickleshare, find-libpython, tzdata, typing-extensions, traitlets, tqdm, tornado, six, pyzmq, pygments, psutil, prompt-toolkit, platformdirs, pexpect, Pebble, parso, packaging, numpy, nest-asyncio, MarkupSafe, executing, exceptiongroup, entrypoints, decorator, debugpy, scipy, quantities, python-dateutil, neuron, matplotlib-inline, jupyter-core, Jinja2, jedi, deap, comm, asttokens, stack-data, pandas, neo, jupyter-client, ipython, efel, ipykernel, ipyparallel, bluepyopt

Successfully installed Jinja2-3.1.3 MarkupSafe-2.1.5 Pebble-5.0.6 asttokens-2.4.1 bluepyopt-1.14.10 comm-0.2.2 deap-1.4.1 debugpy-1.8.1 decorator-5.1.1 efel-5.6.3 entrypoints-0.4 exceptiongroup-1.2.0 executing-2.0.1 find-libpython-0.3.1 ipykernel-6.29.3 ipyparallel-8.7.0 ipython-8.22.2 jedi-0.19.1 jupyter-client-8.6.1 jupyter-core-5.7.2 matplotlib-inline-0.1.6 neo-0.13.0 nest-asyncio-1.6.0 neuron-8.2.4 numpy-1.26.4 packaging-24.0 pandas-2.2.1 parso-0.8.3 pexpect-4.9.0 pickleshare-0.7.5 platformdirs-4.2.0 prompt-toolkit-3.0.43 psutil-5.9.8 ptyprocess-0.7.0 pure-eval-0.2.2 pygments-2.17.2 python-dateutil-2.9.0.post0 pytz-2024.1 pyzmq-25.1.2 quantities-0.15.0 scipy-1.12.0 six-1.16.0 stack-data-0.6.3 tornado-6.4 tqdm-4.66.2 traitlets-5.14.2 typing-extensions-4.10.0 tzdata-2024.1 wcwidth-0.2.13

clupascu commented 5 months ago

in the log I still have

can't open DISPLAY srun: Job step aborted: Waiting up to 32 seconds for job step to finish. slurmstepd-a2n1cn: error: STEP 141.0 ON a2n1cn FAILED (non-zero exit code or other failure mode) 2024-03-13 10:23:55.931 [IPEngine] CRITICAL | received signal 15, stopping

anilbey commented 5 months ago

As far as I know, ipyparallel is an optional dependency in bluepyopt, and its functionality can be substituted with Python's built-in multiprocessing module. If the issue you're encountering stems from communication challenges between ipyparallel workers, switching to Python's multiprocessing could offer a solution by simplifying the process communication mechanism.

There must be a configuration to disable ipyparallel.

+ export USEIPYP=1

Could you try setting this to 0 instead of 1 @clupascu ?

cc @AurelienJaquier

clupascu commented 5 months ago

Hi @anilbey,

with export USEIPYP=0 I still see the can't open DISPLAY, but I get one additional error

Traceback (most recent call last): File "/nisusers/clupascu/.local/lib/python3.10/site-packages/psutil/", line 1714, in wrapper return fun(self, *args, **kwargs) File "/nisusers/clupascu/.local/lib/python3.10/site-packages/psutil/", line 497, in wrapper raise raise_from(err, None) File "", line 3, in raise_from File "/nisusers/clupascu/.local/lib/python3.10/site-packages/psutil/", line 495, in wrapper return fun(self) File "/nisusers/clupascu/.local/lib/python3.10/site-packages/psutil/", line 1777, in _parse_stat_file data = bcat("%s/%s/stat" % (self._procfs_path, File "/nisusers/clupascu/.local/lib/python3.10/site-packages/psutil/", line 840, in bcat return cat(fname, fallback=fallback, _open=open_binary) File "/nisusers/clupascu/.local/lib/python3.10/site-packages/psutil/", line 828, in cat with _open(fname) as f: File "/nisusers/clupascu/.local/lib/python3.10/site-packages/psutil/", line 788, in open_binary return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE) FileNotFoundError: [Errno 2] No such file or directory: '/proc/959011/stat'

anilbey commented 5 months ago

Hello @clupascu,

Have you by any chance run another code that uses multiple processes on that server? If you haven't, could you run the following code that runs multiple processes without using external packages?

import multiprocessing
import os
import time
import random

def worker(task_id):
    # Generate a random sleep time between 1 and 5 seconds
    sleep_time = random.randint(1, 5)
    print(f"Task {task_id} running on PID {os.getpid()}, sleeping for {sleep_time} seconds")
    print(f"Task {task_id} complete")

if __name__ == '__main__':
    # Number of available CPU cores per node
    num_cores = os.cpu_count()
    print(f"Number of CPU cores available: {num_cores}")

    # Creating a pool of workers equal to the number of CPU cores
    with multiprocessing.Pool(processes=num_cores) as pool:
        # Map a list of tasks to the worker function, range(num_cores))
clupascu commented 5 months ago


when I run the code you provided I get

Number of CPU cores available: 192 Task 7 running on PID 845792, sleeping for 1 seconds Task 7 complete

and a lot of tasks completed

anilbey commented 5 months ago

Ok I see. Thanks for providing the output. @clupascu could you also share the script you used in running bluepyopt? I.e. the script/code that led to this issue.

anilbey commented 5 months ago

To address the DISPLAY issue, could you try setting the following before importing bluepyopt/NEURON?



anilbey commented 5 months ago

I.e. the script/code that led to this issue.

If possible, the minimal possible script that can reproduce this issue would be even more useful. Thanks

pramodk commented 4 months ago

Just to update: While at the in-person meeting of EBRAINS, I looked at this issue with Carmen. The DISPLAY variable is set to localhost:10. When NEURON initialized GUI, somehow, this causes an issue. unset DISPLAY or export NEURON_MODULE_OPTIONS='-nogui' solves the issue.

I have heard about -gui interfering in some cases but in the case of this particular cluster, the program terminates right away. Running the same example on BB5, I don't see the same behavior.

anilbey commented 4 months ago

Thanks for the update @pramodk. Good to know the source of the issue. I guess this ticket can be closed now.