deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.62k stars 1.91k forks source link

Not possible to install with poetry #2606

Closed kubami closed 2 years ago

kubami commented 2 years ago

Hi, I wanted to install haystack with poetry:

poetry add ./vendors/haystack -E ocr -E preprocessing -E docstores

however I get an error:

KeyError

  Package('weaviate-client', '2.5.0')

  at ~/.poetry/lib/poetry/puzzle/solver.py:270 in _solve
[...]

The latest haystack is checkout in /vendors/haystack/. When installing with pip, as stated in the docs everything works.

kubami commented 2 years ago

A different problem happens when installing with Pipenv. It seems to be going in a loop trying to resolve dependencies, the by product is it uses infinite amount of disk space :). This is probably a bug in Pipenv.

masci commented 2 years ago

Hi @kubami I'm trying to reproduce the issue but I get a slightly different error.

Can you tell me your:

kubami commented 2 years ago

Poetry: 1.1.13 Python: 3.8.12 (I am using pyenv in conjunction with poetry)

Haystack checkout: 0395533a786cc63bf2f5180ee7d3dc3eefebdd59

Pipenv: 2022.5.2

But this is not constrained to those versions. I have tried installing haystack with poetry for couple of months... Always gave up and just reverted to use pip. Which is a pain, cause all our tooling/workflows is made with poetry.

Thank you for taking a look at this. What error are you getting? Did you specify the extras with poetry? The errors change with different extras.

kmcleste commented 2 years ago

Hi @kubami , I was able to install Haystack successfully with Poetry using poetry add git+https://github.com/deepset-ai/haystack.git#master. This seems to have installed the full feature set that Haystack offers. My environment config is as follows:

Poetry: 1.1.13 Pyenv: 2.2.5 Python: 3.9.1 Haystack: 1.5.1rc0

Poetry config: virtualenvs.create = true virtualenvs.in-project = true virtualenvs.path = ".venv"

To check that it was working as intended, I was able to launch each variation of the document stores provided.

kubami commented 2 years ago

@kmcleste thanks for checking this out.

I can confirm I was able to install haystack with poetry without specifying any extras. poetry add ./vendors/haystack (where the latest haystack is checked out). This has worked both for haystack 1.4.0 and 1.5.0rc0.

It seems the problem is with specifying extras

masci commented 2 years ago

Thanks all for contributing to the issue, I'll keep it open as I want it to work with extras too, but I'm glad you have a workaround for now.

FHardow commented 2 years ago

I've just tried to install the latest haystack version through poetry. The installation went through without any issue. But when importing anything from haystack I'm facing a huggingface-hub error. My guess would be, that the latest release from hugging face is broken, but the packages is pulled as it is the latest version.

$ python test.py 
INFO - haystack.document_stores.base -  Numba not found, replacing njit() with no-op implementation. Enable it with 'pip install numba'.
/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/huggingface_hub/snapshot_download.py:6: FutureWarning: snapshot_download.py has been made private and will no longer be available from version 0.11. Please use `from huggingface_hub import snapshot_download` to import the only public function in this module. Other members of the file may be changed without a deprecation notice.
  warnings.warn(
Traceback (most recent call last):
  File "/home/florian/test/test-haystack-and-huggingface/test.py", line 1, in <module>
    from haystack import Pipeline
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/haystack/__init__.py", line 26, in <module>
    from haystack.nodes.base import BaseComponent
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/haystack/nodes/__init__.py", line 5, in <module>
    from haystack.nodes.answer_generator import BaseGenerator, RAGenerator, Seq2SeqGenerator
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/haystack/nodes/answer_generator/__init__.py", line 2, in <module>
    from haystack.nodes.answer_generator.transformers import RAGenerator, Seq2SeqGenerator
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/haystack/nodes/answer_generator/transformers.py", line 18, in <module>
    from haystack.nodes.retriever.dense import DensePassageRetriever
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/haystack/nodes/retriever/__init__.py", line 2, in <module>
    from haystack.nodes.retriever.dense import DensePassageRetriever, EmbeddingRetriever, TableTextRetriever
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/haystack/nodes/retriever/dense.py", line 22, in <module>
    from haystack.nodes.retriever._embedding_encoder import _EMBEDDING_ENCODERS
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/haystack/nodes/retriever/_embedding_encoder.py", line 8, in <module>
    from sentence_transformers import InputExample, losses
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/sentence_transformers/__init__.py", line 3, in <module>
    from .datasets import SentencesDataset, ParallelSentencesDataset
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/sentence_transformers/datasets/__init__.py", line 3, in <module>
    from .ParallelSentencesDataset import ParallelSentencesDataset
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/sentence_transformers/datasets/ParallelSentencesDataset.py", line 4, in <module>
    from .. import SentenceTransformer
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/sentence_transformers/SentenceTransformer.py", line 25, in <module>
    from .evaluation import SentenceEvaluator
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/sentence_transformers/evaluation/__init__.py", line 5, in <module>
    from .InformationRetrievalEvaluator import InformationRetrievalEvaluator
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/sentence_transformers/evaluation/InformationRetrievalEvaluator.py", line 6, in <module>
    from ..util import cos_sim, dot_score
  File "/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/sentence_transformers/util.py", line 407, in <module>
    from huggingface_hub.snapshot_download import REPO_ID_SEPARATOR
ImportError: cannot import name 'REPO_ID_SEPARATOR' from 'huggingface_hub.snapshot_download' (/home/florian/.cache/pypoetry/virtualenvs/test-haystack-and-huggingface-3I8GGk3v-py3.9/lib/python3.9/site-packages/huggingface_hub/snapshot_download.py)

Poetry installation:

$ poetry add farm-haystack
Creating virtualenv test-haystack-and-huggingface-3I8GGk3v-py3.9 in /home/florian/.cache/pypoetry/virtualenvs
Using version ^1.5.0 for farm-haystack

Updating dependencies
Resolving dependencies... (21.5s)

Writing lock file

Package operations: 91 installs, 0 updates, 0 removals

  • Installing certifi (2022.6.15)
  • Installing charset-normalizer (2.0.12)
  • Installing idna (3.3)
  • Installing markupsafe (2.1.1)
  • Installing pyparsing (3.0.9)
  • Installing urllib3 (1.26.9)
  • Installing zipp (3.8.0)
  • Installing click (8.1.3)
  • Installing filelock (3.7.1)
  • Installing greenlet (1.1.2)
  • Installing importlib-metadata (4.11.4)
  • Installing itsdangerous (2.1.2)
  • Installing jinja2 (3.1.2)
  • Installing numpy (1.22.4)
  • Installing oauthlib (3.2.0)
  • Installing packaging (21.3)
  • Installing pyyaml (6.0)
  • Installing requests (2.28.0)
  • Installing six (1.16.0)
  • Installing smmap (5.0.0)
  • Installing tqdm (4.64.0)
  • Installing typing-extensions (4.2.0)
  • Installing werkzeug (2.1.2)
  • Installing docopt (0.6.2)
  • Installing flask (2.1.2)
  • Installing gitdb (4.0.9)
  • Installing huggingface-hub (0.8.0)
  • Installing isodate (0.6.1)
  • Installing joblib (1.1.0)
  • Installing mako (1.2.0)
  • Installing pillow (9.1.1)
  • Installing prometheus-client (0.14.1)
  • Installing pyjwt (2.4.0)
  • Installing python-dateutil (2.8.2)
  • Installing pytz (2022.1)
  • Installing regex (2022.6.2)
  • Installing requests-oauthlib (1.3.1)
  • Installing scipy (1.6.1)
  • Installing sqlalchemy (1.4.37)
  • Installing tabulate (0.8.9)
  • Installing threadpoolctl (3.1.0)
  • Installing tokenizers (0.12.1)
  • Installing torch (1.11.0)
  • Installing websocket-client (1.3.2)
  • Installing alembic (1.8.0)
  • Installing attrs (21.4.0)
  • Installing azure-common (1.1.28)
  • Installing azure-core (1.22.1)
  • Installing backoff (1.11.1)
  • Installing cloudpickle (2.1.0)
  • Installing databricks-cli (0.16.8)
  • Installing docker (5.0.3)
  • Installing entrypoints (0.4)
  • Installing gitpython (3.1.27)
  • Installing gunicorn (20.1.0)
  • Installing inflect (5.6.0)
  • Installing jarowinkler (1.0.2)
  • Installing lxml (4.9.0)
  • Installing monotonic (1.6)
  • Installing msrest (0.6.21)
  • Installing nltk (3.7)
  • Installing num2words (0.5.10)
  • Installing pandas (1.4.2)
  • Installing prometheus-flask-exporter (0.20.2)
  • Installing protobuf (4.21.1)
  • Installing scikit-learn (1.1.1)
  • Installing querystring-parser (1.2.4)
  • Installing pyrsistent (0.18.1)
  • Installing sentencepiece (0.1.96)
  • Installing sqlparse (0.4.2)
  • Installing torchvision (0.12.0)
  • Installing transformers (4.19.2)
  • Installing azure-ai-formrecognizer (3.2.0b2)
  • Installing dill (0.3.5.1)
  • Installing elastic-apm (6.9.1)
  • Installing elasticsearch (7.10.0)
  • Installing jsonschema (4.6.0)
  • Installing langdetect (1.0.9)
  • Installing mlflow (1.26.1)
  • Installing mmh3 (3.0.0)
  • Installing more-itertools (8.13.0)
  • Installing networkx (2.8.4)
  • Installing posthog (1.4.9)
  • Installing pydantic (1.9.1)
  • Installing python-docx (0.8.11)
  • Installing quantulum3 (0.7.10)
  • Installing rapidfuzz (2.0.11)
  • Installing sentence-transformers (2.2.0)
  • Installing seqeval (1.2.2)
  • Installing tika (1.24)
  • Installing farm-haystack (1.5.0)

Downgrading huggingface-hub manually to 0.7.0 fixed that issue.

nickchomey commented 2 years ago

It seems to be going in a loop trying to resolve dependencies, the by product is it uses infinite amount of disk space :). This is probably a bug in Pipenv.

I like using PDM and it does this as well when installing extras ([all-gpu] in my case) - it takes HOURS to resolve all of the dependencies. In fact, I'm not sure I've ever successfully completed it.

If you use verbose mode, you can see that it is seemingly looping through and checking each version of each dependency against each other. This happens, at least in part, because many of the haystack dependencies do not have any minimum version listed.

So, I removed any extraneous optional dependencies and then started modifying the dependency versions to limit them to the versions that were released in the past 2 years, and it speeds the resolving up considerably, but it was a major nuisance and I never fully succeeded.

So, its hard to say whether the issue is the un-limited dependencies, or if it is something inherent to these dependency managers - they cross-check for good reason, but maybe its not practical for certain packages?

masci commented 2 years ago

Closing as this works now:

poetry add --editable --extras "ocr preprocessing docstores" /path/to/local/haystack

with

kmcleste commented 1 year ago

Sorry to post to a closed issue but this also works with:

poetry add --editable --extras "ocr preprocessing docstore" git+"https://github.com/deepset-ai/haystack.git#main"

with