Closed burtonrj closed 1 year ago
I also tried reinstalling haystack using the instructions on GitHub i.e. cloning the repo and installing from source, but I'm still getting the same error when I run list(tqdm(pool.imap(func, list(range(10000))), total=10000))
after importing anything from haystack.
Hi @burtonrj I tried your snippet with the latest Haystack 1.11 on a fresh virtualenv, on my Macbook PRO with M1 inside a jupyter server and on two different Ubuntu boxes but I couldn't reproduce the error. Provided that Haystack doesn't use ZeroMQ directly, I think this might depend on something you have in your environment that I don't have but very hard to guess what.
Something you could try is see what's the process that's using the same ZMQ port as the shell_port
of your jupyter server when you import Haystack:
jupyter --runtime-dir
shell_port
currently in use by your server: `cat from haystack.schema import Document
from the same kernelshell_port
from before (say it was 59252): lsof -i : 59252
In my case I don't see anything suspicious, the jupyter server is the process holding the port but maybe we can spot something different in your env.
EDIT: sorry hit the wrong button didn't mean to close the issue :)
Hi @masci thanks for picking this up, I've run the following as specified:
cat /home/ross/.local/share/jupyter/runtime/kernel-e52e49b1-53cc-4484-8617-532435b4963c.json
{
"shell_port": 57219,
"iopub_port": 40149,
"stdin_port": 59363,
"control_port": 53635,
"hb_port": 47257,
"ip": "127.0.0.1",
"key": "1fb3c200-5e835d3b90afd86b8de1157b",
"transport": "tcp",
"signature_scheme": "hmac-sha256",
"kernel_name": ""
}
lsof -i :57219
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME jupyter-l 74128 ross 30u IPv4 369706 0t0 TCP localhost:58662->localhost:57219 (ESTABLISHED) python 74422 ross 10u IPv4 356981 0t0 TCP localhost:57219 (LISTEN) python 74422 ross 17u IPv4 338800 0t0 TCP localhost:58658->localhost:57219 (ESTABLISHED) python 74422 ross 19u IPv4 338801 0t0 TCP localhost:57219->localhost:58658 (ESTABLISHED) python 74422 ross 52u IPv4 338806 0t0 TCP localhost:57219->localhost:58662 (ESTABLISHED)
I seem to have 4 other python processes using that port as well. If I restart the kernel, now it's now using port 39023:
cat /home/ross/.local/share/jupyter/runtime/kernel-d0461869-1f7a-469d-9063-27c679aff82a.json
{
"shell_port": 39023,
"iopub_port": 60573,
"stdin_port": 58031,
"control_port": 60385,
"hb_port": 55749,
"ip": "127.0.0.1",
"key": "3d718c0c-bf18003f7a8cb08aa4d223f0",
"transport": "tcp",
"signature_scheme": "hmac-sha256",
"kernel_name": ""
}
And I don't import haystack, I just literally start the notebook up and run no code, I still see the same 4 processes running:
lsof -i :39023
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME jupyter-l 74128 ross 25u IPv4 359767 0t0 TCP localhost:57336->localhost:39023 (ESTABLISHED) python 75575 ross 10u IPv4 368843 0t0 TCP localhost:39023 (LISTEN) python 75575 ross 17u IPv4 354805 0t0 TCP localhost:57332->localhost:39023 (ESTABLISHED) python 75575 ross 18u IPv4 354806 0t0 TCP localhost:39023->localhost:57332 (ESTABLISHED) python 75575 ross 51u IPv4 354810 0t0 TCP localhost:39023->localhost:57336 (ESTABLISHED)
Weird. Any thoughts? Sounds like it is just something funny happening on my machine. I'm happy for you to close this issue and I'll keep investigating and reopen it if necessary.
@burtonrj first of all I just learned about the %connect_info
magic to get the same JSON data, wish I knew it before eheh.
The ports are assigned randomly at startup by default, so it's ok they change. I have 6 processes myself, I think the jupyter server starts its own workers pool so that should be fine too. I'm curious to see if just importing Haystack changes anything, like you see an error or everything stays the same.
I'll keep trying to reproduce...
So I've been experimenting. When I published the original issue, I had multiple envs that I was pointing ipykernel to in a base environment in conda from which I was launching jupyter lab, which I will admit, is a bit messy. I wondered if this was the problem. Therefore, I've tried to recreate the issue from within a single environment where both haystack and jupyter are installed and I run the jupyter server from within that env. But it just gets weirder.
I can use multiprocessing absolutely fine under two conditions:
As soon as I run it inside Jupyter BUT I import haystack first, the kernel dies. WEIRD. Ultimately, I think this is might be a Jupyter issue within my Ubuntu build and not a haystack issue.
P.S. I've tried this with venv, hatch, and poetry for env management, and get the same issue every time.
Hi @burtonrj can we close this issue or are you still facing that error?
Happy holidays @masci, this issue has resolved itself, still not sure what caused it but must be machine specific. Feel free to close.
Thanks @burtonrj happy holidays to you too!
Describe the bug
There seems to be some issue with multiprocessing in Python and haystack.
If I import the multiprocessing library and don't import any haystack modules, I can run the following code without any error:
If however, I import anything from haystack, say, the Document class at any point in my notebook like so:
Then the kernel dies and I get an error message: "ZMQError: Address already in use"
Error message
Traceback (most recent call last): File "/home/ross/anaconda3/envs/haystack/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ross/anaconda3/envs/haystack/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in
app.launch_new_instance()
File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/traitlets/config/application.py", line 981, in launch_instance
app.initialize(argv)
File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/traitlets/config/application.py", line 110, in inner
return method(app, *args, **kwargs)
File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 666, in initialize
self.init_sockets()
File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 307, in init_sockets
self.shell_port = self._bind_socket(self.shell_socket, self.shell_port)
File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 244, in _bind_socket
return self._try_bind_socket(s, port)
File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 220, in _try_bind_socket
s.bind("tcp://%s:%i" % (self.ip, port))
File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/zmq/sugar/socket.py", line 232, in bind
super().bind(addr)
File "zmq/backend/cython/socket.pyx", line 568, in zmq.backend.cython.socket.Socket.bind
File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Address already in use
Expected behavior I want to be able to use multiprocessing for other tasks outside the context of haystack, so I expect that I should be able to use multiprocessing libraries.
Additional context This was performed in Jupyter Lab v3.5.0
I created a separate virtual environment for my jupyter kernel called
haystack
and installed haystack using the command:I had some issues with the install around FAISS and PyTorch:
faiss-cpu
to version 1.7.2 as per issue #3600Pip freeze of my environment:
aiohttp==3.8.3 aiorwlock==1.3.0 aiosignal==1.3.1 alembic==1.8.1 appdirs==1.4.4 astroid==2.12.13 asttokens==2.1.0 async-generator==1.10 async-timeout==4.0.2 attrs==22.1.0 audioread==3.0.0 azure-ai-formrecognizer==3.2.0 azure-common==1.1.28 azure-core==1.26.1 backcall==0.2.0 backoff==1.11.1 beautifulsoup4==4.11.1 beir==1.0.1 black==22.6.0 bleach==5.0.1 cattrs==22.2.0 certifi @ file:///croot/certifi_1665076670883/work/certifi cffi==1.15.1 cfgv==3.3.1 charset-normalizer==2.1.1 ci-sdr==0.0.2 click==8.0.4 cloudpickle==2.2.0 coloredlogs==15.0.1 ConfigArgParse==1.5.3 contourpy==1.0.6 coverage==6.5.0 ctc-segmentation==1.7.4 cycler==0.11.0 Cython==0.29.32 databind==1.5.3 databind.core==1.5.3 databind.json==1.5.3 databricks-cli==0.17.3 datasets==2.7.0 debugpy==1.6.3 decorator==5.1.1 defusedxml==0.7.1 Deprecated==1.2.13 dill==0.3.6 Distance==0.1.3 distlib==0.3.6 dnspython==2.2.1 docker==6.0.1 docopt==0.6.2 docspec==2.0.2 docspec-python==2.0.2 docstring-parser==0.11 einops==0.6.0 elasticsearch==7.9.1 entrypoints==0.4 espnet==202209 espnet-model-zoo==0.1.7 espnet-tts-frontend==0.0.3 exceptiongroup==1.0.4 executing==1.2.0 faiss-cpu==1.7.2 faiss-gpu==1.7.2 farm-haystack==1.11.0 fast-bss-eval==0.1.3 fastjsonschema==2.16.2 filelock==3.8.0 Flask==2.2.2 flatbuffers==22.10.26 fonttools==4.38.0 frozenlist==1.3.3 fsspec==2022.11.0 g2p-en==2.1.0 ghp-import==2.1.0 gitdb==4.0.9 GitPython==3.1.29 greenlet==2.0.1 grpcio==1.37.1 grpcio-tools==1.37.1 gunicorn==20.1.0 h11==0.14.0 h5py==3.7.0 huggingface-hub==0.11.0 humanfriendly==10.0 identify==2.5.9 idna==3.4 importlib-metadata==4.13.0 inflect==6.0.2 iniconfig==1.1.1 ipykernel==6.17.1 ipython==8.6.0 ipywidgets==8.0.2 isodate==0.6.1 isort==5.10.1 itsdangerous==2.1.2 jaconv==0.3 jamo==0.4.1 jarowinkler==1.2.3 jedi==0.18.2 Jinja2==3.1.2 joblib==1.2.0 jsonschema==4.17.0 jupyter_client==7.4.7 jupyter_core==5.0.0 jupytercontrib==0.0.7 jupyterlab-pygments==0.2.2 jupyterlab-widgets==3.0.3 kaldiio==2.17.2 kiwisolver==1.4.4 langdetect==1.0.9 lazy-object-proxy==1.8.0 librosa==0.9.2 llvmlite==0.39.1 loguru==0.6.0 lxml==4.9.1 Mako==1.2.4 Markdown==3.3.7 MarkupSafe==2.1.1 matplotlib==3.6.2 matplotlib-inline==0.1.6 mccabe==0.7.0 mergedeep==1.3.4 mistune==2.0.4 mkdocs==1.4.2 mlflow==2.0.1 mmh3==3.0.0 monotonic==1.6 more-itertools==9.0.0 mpmath==1.2.1 msgpack==1.0.4 msrest==0.7.1 multidict==6.0.2 multiprocess==0.70.14 mypy==0.991 mypy-extensions==0.4.3 nbclient==0.7.0 nbconvert==7.2.5 nbformat==5.7.0 nest-asyncio==1.5.6 networkx==2.8.8 nltk==3.7 nodeenv==1.7.0 nr.util==0.8.12 num2words==0.5.12 numba==0.56.4 numpy==1.23.5 oauthlib==3.2.2 onnx==1.12.0 onnxruntime-gpu==1.13.1 onnxruntime-tools==1.7.0 opensearch-py==2.0.0 outcome==1.2.0 packaging==21.3 pandas==1.5.1 pandocfilters==1.5.0 parso==0.8.3 pathspec==0.10.2 pdf2image==1.16.0 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.3.0 pinecone-client==2.0.13 platformdirs==2.5.4 pluggy==1.0.0 pooch==1.6.0 posthog==2.2.0 pre-commit==2.20.0 prompt-toolkit==3.0.33 protobuf==3.20.1 psutil==5.9.4 psycopg2-binary==2.9.5 ptyprocess==0.7.0 pure-eval==0.2.2 py==1.11.0 py-cpuinfo==9.0.0 py3nvml==0.2.7 pyarrow==10.0.0 pycparser==2.21 pydantic==1.10.2 pydoc-markdown==4.6.4 pydub==0.25.1 Pygments==2.13.0 PyJWT==2.6.0 pylint==2.15.6 pymilvus==2.0.2 pyparsing==3.0.9 pypinyin==0.44.0 pyrsistent==0.19.2 PySocks==1.7.1 pytesseract==0.3.10 pytest==7.2.0 pytest-custom-exit-code==0.3.0 python-dateutil==2.8.2 python-docx==0.8.11 python-dotenv==0.21.0 python-magic==0.4.27 python-multipart==0.0.5 pytorch-wpe==0.0.1 pytrec-eval==0.5 pytz==2022.6 pyworld==0.3.2 PyYAML==5.4.1 pyyaml_env_tag==0.1 pyzmq==24.0.1 quantulum3==0.7.11 querystring-parser==1.2.4 rapidfuzz==2.7.0 ray==1.13.0 rdflib==6.2.0 regex==2022.10.31 requests==2.28.1 requests-cache==0.9.7 requests-oauthlib==1.3.1 resampy==0.4.2 responses==0.18.0 s3cmd==2.3.0 scikit-learn==1.1.3 scipy==1.9.3 selenium==4.6.0 sentence-transformers==2.2.2 sentencepiece==0.1.97 seqeval==1.2.2 shap==0.41.0 six==1.16.0 slicer==0.0.7 smmap==5.0.0 sniffio==1.3.0 sortedcontainers==2.4.0 soundfile==0.11.0 soupsieve==2.3.2.post1 SPARQLWrapper==2.0.0 SQLAlchemy==1.4.44 SQLAlchemy-Utils==0.38.3 sqlparse==0.4.3 stack-data==0.6.1 sympy==1.11.1 tabulate==0.9.0 threadpoolctl==3.1.0 tika==1.24 tinycss2==1.2.1 tokenize-rt==5.0.0 tokenizers==0.12.1 toml==0.10.2 tomli==2.0.1 tomli_w==1.0.0 tomlkit==0.11.6 torch==1.13.0+cu116 torch-complex==0.4.3 torchaudio==0.13.0+cu116 torchvision==0.14.0+cu116 tornado==6.2 tox==3.27.1 tqdm==4.64.1 traitlets==5.5.0 transformers==4.21.2 trio==0.22.0 trio-websocket==0.9.2 typeguard==2.13.3 typing_extensions==4.4.0 ujson==5.1.0 Unidecode==1.3.6 url-normalize==1.4.3 urllib3==1.26.12 validators==0.18.2 virtualenv==20.16.7 watchdog==2.1.9 wcwidth==0.2.5 weaviate-client==3.9.0 webdriver-manager==3.8.5 webencodings==0.5.1 websocket-client==1.4.2 Werkzeug==2.2.2 widgetsnbextension==4.0.3 wrapt==1.14.1 wsproto==1.2.0 xmltodict==0.13.0 xxhash==3.1.0 yapf==0.32.0 yarl==1.8.1 zipp==3.10.0
FAQ Check
System: