VisionSystemsInc / vsi_common

A set of common scripts use by VSI on numerous projects. See https://visionsystemsinc.github.io/vsi_common
http://visionsystemsinc.com/
MIT License
7 stars 8 forks source link

Add `piptools` as an alternative to `pipenv` #437

Open andyneff opened 2 years ago

andyneff commented 2 years ago
andyneff commented 1 year ago

Default requirements.in:

pip-tools

Default requirements.txt:

#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
#    pip-compile --allow-unsafe --generate-hashes --resolver=backtracking
#
build==0.10.0 \
    --hash=sha256:af266720050a66c893a6096a2f410989eeac74ff9a68ba194b3f6473e8e26171 \
    --hash=sha256:d5b71264afdb5951d6704482aac78de887c80691c52b88a9ad195983ca2c9269
    # via pip-tools
click==8.1.3 \
    --hash=sha256:7682dc8afb30297001674575ea00d1814d808d6a36af415a82bd481d37ba7b8e \
    --hash=sha256:bb4d8133cb15a609f44e8213d9b391b0809795062913b383c62be0ee95b1db48
    # via pip-tools
packaging==23.0 \
    --hash=sha256:714ac14496c3e68c99c29b00845f7a2b85f3bb6f1078fd9f72fd20f0570002b2 \
    --hash=sha256:b6ad297f8907de0fa2fe1ccbd26fdaf387f5f47c7275fedf8cce89f99446cf97
    # via build
pip-tools==6.12.3 \
    --hash=sha256:480d44fae6e09fad3f9bd3d0a7e8423088715d10477e8ef0663440db25e3114f \
    --hash=sha256:8510420f46572b2e26c357541390593d9365eb6edd2d1e7505267910ecaec080
    # via -r requirements.in
pyproject-hooks==1.0.0 \
    --hash=sha256:283c11acd6b928d2f6a7c73fa0d01cb2bdc5f07c57a2eeb6e83d5e56b97976f8 \
    --hash=sha256:f271b298b97f5955d53fb12b72c1fb1948c22c1a6b70b315c54cedaca0264ef5
    # via build
tomli==2.0.1 \
    --hash=sha256:939de3e7a6161af0c887ef91b7d41a53e7c5a1ca976325f429cb46ea9bc30ecc \
    --hash=sha256:de526c12914f0c550d15924c62d72abc48d6fe7364aa87328337a31007fe8a4f
    # via
    #   build
    #   pyproject-hooks
wheel==0.40.0 \
    --hash=sha256:cd1196f3faee2b31968d626e1731c94f99cbdb67cf5a46e4f5656cbee7738873 \
    --hash=sha256:d236b20e7cb522daf2390fa84c55eea81c5c30190f90f29ae2ca1ad8355bf247
    # via pip-tools

# The following packages are considered to be unsafe in a requirements file:
pip==23.0.1 \
    --hash=sha256:236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f \
    --hash=sha256:cd015ea1bfb0fcef59d8a286c1f8bebcb983f6317719d415dc5351efb7cd7024
    # via pip-tools
setuptools==67.6.1 \
    --hash=sha256:257de92a9d50a60b8e22abfcbb771571fde0dbf3ec234463212027a4eeecbe9a \
    --hash=sha256:e728ca814a823bf7bf60162daf9db95b93d532948c4c0bea762ce62f60189078
    # via pip-tools
andyneff commented 1 year ago

Using pip-tools

BOYV

python -m venv /foo

Install pip-tools:

Because there are hashes in the file, pip is too broken to do anything other than all or nothing. So we have to do the initial pip-tools install without hashes.

/foo/bin/pip install -c requirements.txt --require-hashes pip-tools $(unwrap requirements.txt | grep ^pip-tools) #noqa

There is now a pip tools helper function that will allow us to get the list of deps for pip-tools, and then we could use that to create a temp requirements file for just pip-tools, but I don't like that at all

The other possibility it to have multiple requirements files broken up for the stages of install. E.g.

  1. stage1.txt - just pip-tools, and can be installed by pure pip
  2. stage2.txt - maybe numpy/torch/other install dependencies that are not properly captures by pyproject.toml files
  3. stage3.txt - everything else

Requirements can be daisy chained this way, so that is supported. I do not know how resolution works in this chain. Hoping for the best

Precompile difficult wheels:

/foo/pip wheel "git+https://www.example.com/foo/bar@f32498723146984365987326589743#egg=packagename" \
               --no-deps --no-build-isolation -w "/venv/wheels"

This might have to be done by uploading the stage that compiles to wheel to a central location, and never updating it for that version. Or else every user who builds this will have a different sha for the exact same wheel (different timestamps/owners/other build artifacts that don't matter)

Typically, when we do this, we tend to not put them in the virtual env tracker. It does not look like pip-tools gives us any option on this either. Unless we can use multiple requirements.txt files (and use --no-deps) or grep on the fly.

Installing from git/online tarballs

While this creates a proper cached wheel, and works in pip-tools, as soon as you enable shas, pip will break. pip-sync might still function, but pip bootstrapping will be broken.

git+https://github.com/pallets/markupsafe@e7930ee96a3c09480af43da74888713b3d2c9c10

The following alternatives all fail to save a wheel in the cache, meaning it has to be recompiled every pip-sync

git+https://github.com/pallets/markupsafe@2.0.0#egg=markupsafe
-e git+https://github.com/pallets/markupsafe@2.0.0#egg=markupsafe
-e git+https://github.com/pallets/markupsafe@e7930ee96a3c09480af43da74888713b3d2c9c10#egg=markupsafe

The recommended way (based on the warnings) to get around the sha not being available, is to use a archive file.

https://github.com/pallets/markupsafe/archive/e7930ee96a3c09480af43da74888713b3d2c9c10.zip
./e7930ee96a3c09480af43da74888713b3d2c9c10.zip

However, neither of these options, allow for pip to use the wheel cache either.

Note: The sha in the requirements.txt file is the sha of the archive, not the wheel. So the wheel can vary

andyneff commented 1 year ago

More notes

The proposed workaround will be:

requirements.in This will contain the normal python dependencies. It will always include some version of `pip-tools`. Editable directories would probably go in here, it won't really matter. ```bash pip-tools packagea packageb -r shaless.in -r early.in ```
early.in Any wheels that are compile-time dependencies of other compiled libraries go in here, so we can make sure they are installed first ```bash numpy torch # or some url to a whl for torch torchvision ```
shaless.in Any wheels we compile ourself, or git repos using the `git+https` syntax have to go in here. As they either do not have SHAs or need special treatment. ```bash /venv/wheelhouse/long_compile.whl # Put helper string here to help awk/sed for more complicated scenarios git+https://github.com/example.com/foo.bar.git@f98654f875f87652873ea356 # Must be by SHA syntax with git servers or else cache will not work properly!! ```

From this, we will have one requirements.txt. From that, we will generate (either real time or post pip-compile)

  1. pip-tools.txt - Just pip-tool's dependencies.
    • This will be 100% checksummed.
    • This will require a recursive dependency parse of the requirements.txt file to determine what all is needed for pip-tools. (The dependency parsing step has been completed already.).
    • This file can only be used by pip as it is used to install pip-tools, and you can't very well use pip-tools to install pip-tools.
  2. early.txt - This will contain the packages listed in early.in and their dependencies. This is only for "dependencies that are needed in order to build another package" later on.
    • This will be 100% checksummed.
    • This will require a recursive dependency parse of the requirements.txt file to determine what all is needed for these packages.
    • pip-tools will also be included, so this is a super set.
    • The resulting file can be used by both pip and pip-sync, preferrable pip-sync.
  3. shaed.txt - Everything except the files mentioned in shaless.in.
    • This will be 100% checksummed.
    • This should be simple awk and sed.
    • pip-tools and early will also be included, so this is a super set.
    • The resulting file can be used by both pip and pip-sync, preferrable pip-sync.
  4. shaless.txt - Only packages mentioned in shaless.in will go here.
    • This will be 0% checksummed.
    • This will not include "editable directories" (i.e. /src and /src/external/*). Any other editables (not directories) will still be included here. The dependencies of these packages will not be in this file, but in shaed.txt instead (unless they too are sha-less or editable).
    • This should be simple awk and sed.
    • The resulting file can only be used by pip, and must include the --no-deps flags.
  5. editable.txt - Only editable directories will go in here.
    • This will be 0% checksummed.
    • This file will be used at entrypoint time.
    • This should be simple awk and sed.
    • The resulting file can only be used by pip, and must include the --no-deps flags.

The requirements.txt will never actually get used.

Is this all worth it?

Probably. While we are piecemealing our use of pip-sync and pip install, pip-compile is still working exactly as intended. The only difference here is we are addressing the lessons learned from using pipenv, mainly:

andyneff commented 1 year ago
/foo/bin/pip install -c requirements.txt --require-hashes pip-tools $(unwrap requirements.txt | grep ^pip-tools) #noqa

Note: pip install -c errors on editables:

DEPRECATION: Constraints are only allowed to take the form of a package name and a version specifier. Other forms were originally permitted as an accident of the implementation, but were undocumented. The new implementation of the resolver no longer supports these forms. A possible replacement is replacing the constraint with a requirement. Discussion can be found at https://github.com/pypa/pip/issues/8210 ERROR: Unnamed requirements are not allowed as constraints

andyneff commented 6 months ago

The [/] also messed up in requirements.txt when using -c

I think the method moving forward will be to remove extras from requirements.txt after pip-compile when they are used, and only have to worry about editables when using pip install -c

# Remove [] extras from the requirements.txt file
# - While these are important in setup.py/pyproject.toml/requirements.in
#   I currently do not see a reason why we need this in requirements.txt
#   after the resolver has run. This will alleviate half of the 8210 issue
# - Leave editables alone (any line that starts with -)
# - Adds the original line on the next line, commented out
sed -iE 's|(^[^-[][^[]*)(\[.*\])(==.*)|\1\3\n    # &|' requirements.txt