DerwenAI / kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
https://derwen.ai/docs/kgl/
MIT License
581 stars 66 forks source link

Gotcha: numpy version conflict if installing in existing environment with tensorflow 2.4.1 #160

Closed CatChenal closed 3 years ago

CatChenal commented 3 years ago

Problem: When installing kglab using pip in an existing (activated) environment, the latest version of numpy is installed (because requirements.txt includes 'numpy >= 1.19.4'). This may create conflicts with other packages.

Specific Case: latest numpy version and tensorflow 2.4.1 version conflict: My activated env contains tensorflow 2.4.1.
Near the end of the installation process from pip install kglab, I got this error message (abbreviated):

[...]
Installing collected packages: 
[...], kglab
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.2
    Uninstalling numpy-1.19.2:
      Successfully uninstalled numpy-1.19.2
**ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
  tensorflow 2.4.1 requires numpy~=1.19.2, but you have numpy 1.20.2 which is incompatible.**
Successfully installed [all needed]

My fix:

  1. pip uninstall numpy
  2. pip install numpy==1.19.4

My (minimal) tests:

Suggestion/Question: Perhaps changing the numpy requirement from 'numpy >= 1.19.4' to 'numpy == 1.19.4' would force pip to install this first compatible version instead of the latest?

ceteri commented 3 years ago

Thank you @CatChenal !

Yes, I've seen a related problem in my Ray tutorials where is TF causing issues with the later versions of NumPy (1.20.x)

That's the best workaround that I could see, too.

For dependencies, we prefer to pin the versions using ranges.
Would it help if we pinned to >= 1.19, < 1.20 for now?

In general I'm reluctant to place an upper bounds, since some people don't use TF and they need the latest NumPy for other integration purposes. Plus, I suspect that TF will catch up, eventually. Pandas and Arrow have some similar issues w.r.t. RAPIDS, although the latter is planning to catch up in the next release.

ceteri commented 3 years ago

Also, I'll added a note in the (upcoming) FAQ

CatChenal commented 3 years ago

Thanks @ceteri.

Would it help if we pinned to >= 1.19, < 1.20 for now?

I would hold off for the moment:

  1. My 'suggestion' should have been just a question (my bad!).
  2. I was a bit too hasty in installing a brand new package in an existing environment (end user problem).
  3. For my specific TensorFlow/Numpy conflict, I now know what the requirements are in the current TF 'REQUIRED_PACKAGES'.
  4. Until I test kglab with numpy==1.19.4 exhaustively my "fix" is just a plausible hypothesis (not even a workaround)!
  5. I assume that fixing the upper bound of the Numpy version would require an audit of all the packages in requirements.txt to get their own Numpy dependency, then use the highest [a]. Is there another way? [a]. It seems that pip is installing the highest release (i.e. Numpy 1.20.2, which is 23 days old as of this post): the audit would tell which package needs it (if any).

Perhaps a warning box in README would suffice, e.g.:

WARNING on Installing kglab in an existing environment: Installing a new package in an existing environment may reveal or create version conflicts. See the requirements of kglab in requirements.txt before you do. A known version conflict is that of Numpy in kglab (>= 1.19.4) and TensorFlow 2+ (~-1.19.2).

ceteri commented 3 years ago

Just did a roll back of the NumPy requirement, so this should work fine with >= 1.19.2 now. Also added your language above as a warning, along with notes about the associated PEP 517 errors that may come up.

Many thanks @CatChenal !