ckan / ideas

[DEPRECATED] Use the main CKAN repo Discussions instead:
https://github.com/ckan/ckan/discussions
40 stars 2 forks source link

CKAN Coding Standards -- Dependencies #270

Open EricSoroos opened 3 years ago

EricSoroos commented 3 years ago

CKAN version

All currently shipping

Describe the bug

The current CKAN coding standards for extensions leads to inconsistent installs and works against the zen of the python packaging system.

Specifically: (https://docs.ckan.org/en/2.9/extensions/best-practices.html#add-third-party-libraries-to-requirements-txt)

If your extension requires third party libraries, rather than adding them to setup.py, they should be added to requirements.txt, which can be installed with: pip install -r requirements.txt

Separating the dependencies from setup.py breaks the pip resolver. So not only are we likely to get conflicts, we can't even use the tools to figure out what they are.

Steps to reproduce

A small example:

I'm sure that there are other examples, e.g requests, six.

Expected behavior

Ideally, the ckan, extensions, and all of the python dependencies could be installed with: pip install --use-feature=2020-resolver ckan ckanext_foo ckanext_bar ckanext_scheming ckanext_pages...

This would be a one-shot install, and the pip resolver could build the entire dependency tree, and inform us of conflicts, which could then be resolved.

Then, we could see the results of our dependencies with pipdeptree:

$ pipdeptree
Warning!! Cyclic dependencies found:
* SPARQLWrapper => rdflib => SPARQLWrapper
* rdflib => SPARQLWrapper => rdflib
------------------------------------------------------------------------
bleach==2.1.3
  - html5lib [required: >=0.99999999pre,!=1.0b8,!=1.0b7,!=1.0b6,!=1.0b5,!=1.0b4,!=1.0b3,!=1.0b2,!=1.0b1, installed: 1.0.1]
    - six [required: >=1.9, installed: 1.15.0]
    - webencodings [required: Any, installed: 0.5.1]
  - six [required: Any, installed: 1.15.0]
ckan==2.8.7
ckanext-dcat==1.1.1
ckanext-harvest==1.3.2
  - ckantoolkit [required: ==0.0.3, installed: 0.0.3]
  - pika [required: >=1.1.0, installed: 1.2.0]
  - redis [required: <=3.0, installed: 2.10.6]
  - requests [required: >=2.11.1, installed: 2.25.1]
    - certifi [required: >=2017.4.17, installed: 2020.12.5]
    - chardet [required: >=3.0.2,<5, installed: 4.0.0]
    - idna [required: >=2.5,<3, installed: 2.10]
    - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.3]
  - six [required: >=1.12.0, installed: 1.15.0]
...

Where now, CKAN has no dependencies. (note, ckanext-harvest has had it's requirements hoisted (and tweaked) into setup.py in my install.)

Additional details

To reduce the number of conflicts, extensions should be liberal in the versions what dependencies they pin, e.g. pinning a specific version of requests is unnecessary, especially given how stable the requests interface is.

requests>=2,<3

Obviously, there are cases where a specific pinned dependency is required. That's to be expected if you are relying on the specific behavior of a dependency Transitive dependencies should not be pinned. (e.g., certifi and pytz have stable interfaces, and should generally always be on the latest version). Any extension dependency that's also a dependency of CKAN shouldn't be pinned, because then the CKAN pins for different major versions will conflict.

CKAN should be the most specific pin, and extensions compatible with a specific version of CKAN should be compatible with that pin. However, CKAN should also be generally forgiving of the exact version of dependencies that have relatively stable interfaces where it's not using 'deep' features.

Ideally, CKAN would also be only pinning direct dependencies to make it easier to have a clear view of what needs to be updated on point releases. But that's a further bridge to cross.

wardi commented 3 years ago

Thank you for this Eric.

We've arrived at our current imperfect state of dependency management after struggling with some of the ideas you present above.

Semantic versioning is a nice idea, but many dependencies or sub-dependencies don't maintain compatibility across versions the way you would expect. Without pinning versions in requirements.txt we have frequent build errors unrelated to our own changes. Reproducible builds are a really good thing.

install_requires is inflexible. Once you pip install a package with install_requires it will refuse to run as soon as any dependency listed doesn't match the required version(s). This makes it very hard to even test dependency version upgrades. Even harder is combining multiple extensions with different versions of requirements, when for most sites all they need is a known order to install their requirements.txt to get up and running.

The extra work of determining the safe range of requirements for every dependency or sub-dependency is significant, potentially taking away time from developing features our users need. Right now that extra work falls to the end users or system integrators building and supporting CKAN sites with many extensions and customizations. Those users only need to make their extensions and customizations work for a single set of versions they care about and are able to test themselves.

There's room for interested parties to package CKAN+extensions with sets of known working package versions that are easy for end users to install. In fact they already do as docker images, amazon images etc. I think we should support their efforts and maybe even link to well-maintained examples from our install instructions. AFAICT pip isn't the right tool for this problem.

EricSoroos commented 3 years ago

Python packaging is complicated. At least we're not trying to ship C-extensions. But, it is getting better. The 2020 resolver in pip is way better at finding a solution that will match a large number of libraries, as long as it can get the data and they're not too picky about the exact versioning of the requirements. The previous resolver had a lot more issues finding a workable solution.

Right now, we have the worst of both worlds. We have a lot of potentially conflicting requirements, we're not notified of them (pip happily uninstalls one and installs another, potentially older version), and we can't even use the existing tools to make some sort of resolution. So to get to the point that we have a installation that qualifies for a reproducible build, we've got to go through, find, and manually resolve conflicts. (and then pin them)

Most other python packages (including the requirements) are installable from pypi, and specify a set of requirements in install_requires. From one of my ckan installations, I've got Pylons:

Pylons==0.9.7
  - Beaker [required: >=1.2.2, installed: 1.9.0]
    - funcsigs [required: Any, installed: 1.0.2]
  - decorator [required: >=2.3.2, installed: 4.2.1]
  - FormEncode [required: >=1.2.1, installed: 1.3.1]
  - Mako [required: >=0.2.4, installed: 1.0.7]
    - MarkupSafe [required: >=0.9.2, installed: 1.0]
  - nose [required: >=0.10.4, installed: 1.3.7]
  - Paste [required: >=1.7.2, installed: 1.7.5.1]
  - PasteDeploy [required: >=1.3.3, installed: 1.5.2]
  - PasteScript [required: >=1.7.3, installed: 2.0.2]
    - Paste [required: >=1.3, installed: 1.7.5.1]
    - PasteDeploy [required: Any, installed: 1.5.2]
    - six [required: Any, installed: 1.15.0]
  - Routes [required: >=1.10.3, installed: 1.13]
    - repoze.lru [required: >=0.3, installed: 0.7]
  - simplejson [required: >=2.0.8, installed: 3.10.0]
  - Tempita [required: >=0.2, installed: 0.5.2]
  - WebError [required: >=0.10.1, installed: 0.13.1]
    - Paste [required: >=1.7.1, installed: 1.7.5.1]
    - Pygments [required: Any, installed: 2.2.0]
    - Tempita [required: Any, installed: 0.5.2]
    - WebOb [required: Any, installed: 1.0.8]
  - WebHelpers [required: >=0.6.4, installed: 1.3]
    - MarkupSafe [required: >=0.9.2, installed: 1.0]
  - WebOb [required: >=0.9.6.1, installed: 1.0.8]
  - WebTest [required: >=1.1, installed: 1.4.3]
    - WebOb [required: Any, installed: 1.0.8]

And The google api client:

google-api-python-client==1.12.8
  - google-api-core [required: >=1.21.0,<2dev, installed: 1.26.1]
    - futures [required: >=3.2.0, installed: 3.3.0]
    - google-auth [required: >=1.21.1,<2.0dev, installed: 1.27.1]
      - cachetools [required: >=2.0.0,<5.0, installed: 3.1.1]
      - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
        - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
      - rsa [required: <4.6, installed: 4.5]
        - pyasn1 [required: >=0.1.3, installed: 0.4.8]
      - setuptools [required: >=40.3.0, installed: 41.0.0]
      - six [required: >=1.9.0, installed: 1.15.0]
    - googleapis-common-protos [required: >=1.6.0,<2.0dev, installed: 1.52.0]
      - protobuf [required: >=3.6.0, installed: 3.15.5]
        - six [required: >=1.9, installed: 1.15.0]
    - packaging [required: >=14.3, installed: 20.9]
      - pyparsing [required: >=2.0.2, installed: 2.4.7]
    - protobuf [required: >=3.12.0, installed: 3.15.5]
      - six [required: >=1.9, installed: 1.15.0]
    - pytz [required: Any, installed: 2016.7]
    - requests [required: >=2.18.0,<3.0.0dev, installed: 2.25.1]
      - certifi [required: >=2017.4.17, installed: 2020.12.5]
      - chardet [required: >=3.0.2,<5, installed: 4.0.0]
      - idna [required: >=2.5,<3, installed: 2.10]
      - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.3]
    - setuptools [required: >=40.3.0, installed: 41.0.0]
    - six [required: >=1.13.0, installed: 1.15.0]
  - google-auth [required: >=1.16.0, installed: 1.27.1]
    - cachetools [required: >=2.0.0,<5.0, installed: 3.1.1]
    - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
      - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
    - rsa [required: <4.6, installed: 4.5]
      - pyasn1 [required: >=0.1.3, installed: 0.4.8]
    - setuptools [required: >=40.3.0, installed: 41.0.0]
    - six [required: >=1.9.0, installed: 1.15.0]
  - google-auth-httplib2 [required: >=0.0.3, installed: 0.1.0]
    - google-auth [required: Any, installed: 1.27.1]
      - cachetools [required: >=2.0.0,<5.0, installed: 3.1.1]
      - pyasn1-modules [required: >=0.2.1, installed: 0.2.8]
        - pyasn1 [required: >=0.4.6,<0.5.0, installed: 0.4.8]
      - rsa [required: <4.6, installed: 4.5]
        - pyasn1 [required: >=0.1.3, installed: 0.4.8]
      - setuptools [required: >=40.3.0, installed: 41.0.0]
      - six [required: >=1.9.0, installed: 1.15.0]
    - httplib2 [required: >=0.15.0, installed: 0.17.4]
    - six [required: Any, installed: 1.15.0]
  - httplib2 [required: >=0.15.0,<1dev, installed: 0.17.4]
  - six [required: >=1.13.0,<2dev, installed: 1.15.0]
  - uritemplate [required: >=3.0.0,<4dev, installed: 3.0.1]

And the Office rest client:

Office365-REST-Python-Client==2.1.7.post1
  - adal [required: Any, installed: 1.2.6]
    - cryptography [required: >=1.1.0, installed: 3.3.2]
      - cffi [required: >=1.12, installed: 1.14.5]
        - pycparser [required: Any, installed: 2.20]
      - enum34 [required: Any, installed: 1.1.10]
      - ipaddress [required: Any, installed: 1.0.23]
      - six [required: >=1.4.1, installed: 1.15.0]
    - PyJWT [required: >=1.0.0,<3, installed: 1.7.1]
    - python-dateutil [required: >=2.1.0,<3, installed: 2.8.1]
      - six [required: >=1.5, installed: 1.15.0]
    - requests [required: >=2.0.0,<3, installed: 2.25.1]
      - certifi [required: >=2017.4.17, installed: 2020.12.5]
      - chardet [required: >=3.0.2,<5, installed: 4.0.0]
      - idna [required: >=2.5,<3, installed: 2.10]
      - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.3]
  - requests [required: Any, installed: 2.25.1]
    - certifi [required: >=2017.4.17, installed: 2020.12.5]
    - chardet [required: >=3.0.2,<5, installed: 4.0.0]
    - idna [required: >=2.5,<3, installed: 2.10]
    - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.3]

In contrast, there are the ckan ones: (of these, scheming is the only one that specifies dependencies upstream, the other two I've patched)

ckan==2.8.7
ckanext-datarequests==0.2.13
ckanext-dcat==1.1.1
  - ckantoolkit [required: ==0.0.3, installed: 0.0.3]
  - future [required: >=0.18.2, installed: 0.18.2]
  - geomet [required: >=0.2.0, installed: 0.3.0]
    - click [required: Any, installed: 6.7]
    - six [required: Any, installed: 1.15.0]
  - rdflib [required: ==4.2.1, installed: 4.2.1]
    - html5lib [required: Any, installed: 1.0.1]
      - six [required: >=1.9, installed: 1.15.0]
      - webencodings [required: Any, installed: 0.5.1]
    - isodate [required: Any, installed: 0.6.0]
      - six [required: Any, installed: 1.15.0]
    - pyparsing [required: Any, installed: 2.4.7]
    - SPARQLWrapper [required: Any, installed: 1.8.5]
  - rdflib-jsonld [required: ==0.4.0, installed: 0.4.0]
    - rdflib [required: >=4.2, installed: 4.2.1]
      - html5lib [required: Any, installed: 1.0.1]
        - six [required: >=1.9, installed: 1.15.0]
        - webencodings [required: Any, installed: 0.5.1]
      - isodate [required: Any, installed: 0.6.0]
        - six [required: Any, installed: 1.15.0]
      - pyparsing [required: Any, installed: 2.4.7]
      - SPARQLWrapper [required: Any, installed: 1.8.5]
  - six [required: Any, installed: 1.15.0]
ckanext-dietstars==0.0.1
ckanext-geoview==0.2.0
ckanext-harvest==1.3.2
  - ckantoolkit [required: ==0.0.3, installed: 0.0.3]
  - factory-boy [required: Any, installed: 2.12.0]
    - Faker [required: >=0.7.0, installed: 3.0.1]
      - ipaddress [required: Any, installed: 1.0.23]
      - python-dateutil [required: >=2.4, installed: 2.8.1]
        - six [required: >=1.5, installed: 1.15.0]
      - six [required: >=1.10, installed: 1.15.0]
      - text-unidecode [required: ==1.3, installed: 1.3]
  - mock [required: Any, installed: 3.0.5]
    - funcsigs [required: >=1, installed: 1.0.2]
    - six [required: Any, installed: 1.15.0]
  - pika [required: >=1.1.0, installed: 1.2.0]
  - pyopenssl [required: Any, installed: 20.0.1]
    - cryptography [required: >=3.2, installed: 3.3.2]
      - cffi [required: >=1.12, installed: 1.14.5]
        - pycparser [required: Any, installed: 2.20]
      - enum34 [required: Any, installed: 1.1.10]
      - ipaddress [required: Any, installed: 1.0.23]
      - six [required: >=1.4.1, installed: 1.15.0]
    - six [required: >=1.5.2, installed: 1.15.0]
  - redis [required: <=3.0, installed: 2.10.6]
  - requests [required: >=2.11.1, installed: 2.25.1]
    - certifi [required: >=2017.4.17, installed: 2020.12.5]
    - chardet [required: >=3.0.2,<5, installed: 4.0.0]
    - idna [required: >=2.5,<3, installed: 2.10]
    - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.3]
  - six [required: >=1.12.0, installed: 1.15.0]
ckanext-impactstories==0.0.1
ckanext-likes==0.0.1
ckanext-officedocs==0.0.1
ckanext-pages==0.1.0
ckanext-pdfview==0.0.5
ckanext-scheming==2.1.0
  - ckanapi [required: Any, installed: 4.5]
    - docopt [required: Any, installed: 0.6.2]
    - python-slugify [required: >=1.0, installed: 4.0.1]
      - text-unidecode [required: >=1.3, installed: 1.3]
    - requests [required: Any, installed: 2.25.1]
      - certifi [required: >=2017.4.17, installed: 2020.12.5]
      - chardet [required: >=3.0.2,<5, installed: 4.0.0]
      - idna [required: >=2.5,<3, installed: 2.10]
      - urllib3 [required: >=1.21.1,<1.27, installed: 1.26.3]
    - setuptools [required: Any, installed: 41.0.0]
    - simplejson [required: Any, installed: 3.10.0]
    - six [required: >=1.9,<2.0, installed: 1.15.0]
  - ckantoolkit [required: >=0.0.2, installed: 0.0.3]
  - pytz [required: Any, installed: 2016.7]
  - pyyaml [required: Any, installed: 5.4.1]
  - six [required: Any, installed: 1.15.0]
ckanext-showcase==1.0.3
ckanext-spatial==0.2
ckanext-ytp-comments==0.0

Note that the ckan ones don't help me by specifying dependencies.

Looking at this install, I've got this many transitive dependencies of six, not including ckan and extensions:

six [required: >=1.10, installed: 1.15.0]
six [required: >=1.12.0, installed: 1.15.0]
six [required: >=1.13.0,<2dev, installed: 1.15.0]
six [required: >=1.13.0, installed: 1.15.0]
six [required: >=1.4.1, installed: 1.15.0]
six [required: >=1.4.1, installed: 1.15.0]
six [required: >=1.5.2, installed: 1.15.0]
six [required: >=1.5, installed: 1.15.0]
six [required: >=1.5, installed: 1.15.0]
six [required: >=1.5, installed: 1.15.0]
six [required: >=1.5, installed: 1.15.0]
six [required: >=1.5, installed: 1.15.0]
six [required: >=1.6.1, installed: 1.15.0]
six [required: >=1.7.0, installed: 1.15.0]
six [required: >=1.9.0, installed: 1.15.0]
six [required: >=1.9.0, installed: 1.15.0]
six [required: >=1.9.0, installed: 1.15.0]
six [required: >=1.9,<2.0, installed: 1.15.0]
six [required: >=1.9, installed: 1.15.0]
six [required: >=1.9, installed: 1.15.0]
six [required: >=1.9, installed: 1.15.0]
six [required: >=1.9, installed: 1.15.0]
six [required: >=1.9, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]
six [required: Any, installed: 1.15.0]

Without the install_requires, beautiful_soup is a top level dependency:

beautifulsoup4==4.9.3
  - soupsieve [required: >1.2,<2.0, installed: 1.9.6]
    - backports.functools-lru-cache [required: Any, installed: 1.6.1]

But unfortunately, grepping through the requirements.txt files, this is what I have for beautiful soup:

beautifulsoup4
beautifulsoup4==4.3.2
beautifulsoup4==4.3.2
beautifulsoup4==4.8.2

This happily installs, and depending on the order, I get something. If these were in the install_requires, I'd at least get notified. Thankfully, looking at all of these, there are only a couple of silent conflicts, but they'd be better if they were loud.

While semantic versioning isn't going to save us, it is a tool. It's better than the requirements.txt, where the only guarantee is that a package matching that name will be installed. Tests can save us, if they have enough coverage and get run.

(sorry for the length, the copy/paste of pipdeptree was longer than I'd thought)