NASA-PDS / devops

Parent repo for PDS DevOps activities
Apache License 2.0
0 stars 0 forks source link

Analyze if 2-step dependency management is needed #59

Open tloubrieu-jpl opened 1 year ago

tloubrieu-jpl commented 1 year ago

💡 Description

For python with:

  1. requirements.txt
  2. the result of pip freeze > requirements-freeze.txt

For javascript/npm projects:

  1. package.json
  2. package-lock.json

Revise the continuous integration procedures accordingly.

anilnatha commented 1 year ago

@tloubrieu-jpl

Python

On my past projects we incorporated the use of two files, listed below, to create deterministic builds (we employed the use of virtual environments or Docker containers to ensure we had a clean environment):

File Purpose
requirements-to-freeze.txt Contains a list of Python Packages the application requires. This file is managed by the developers manually
requirements.txt Auto generated by way of running pip freeze > requirements.txt

In a nutshell, during the normal course of development, and as changes are made to requirements-to-freeze.txt, devs ran pip install -r requirements-to-freeze.txt to install the packages. We then immediately followed the installation by "freezing" the configuration by running pip freeze > requirements.txt. Both of these files should be committed to the repository.

When creating builds and/or deploying the application, we ran pip install -r requirements.txt to ensure we created deterministic builds. You can see how a change to requirements-to-freeze.txt can result in numerous changes, some out of our control, in a requirements.txt file in this example commit of the JPL Space project where this strategy was employed.

Node Packages using NPM

The philosophy described above can be applied to npm packages; however, to do so we need to learn more about the two relevant npm cli commands, npm install and npm clean-install. There short forms are npm i and npm ci respectively.

npm install

In the form we are relying on, installation of a set of packages specified in package.json, this command will automatically generate a package-lock.json or update it if one already exists. This is done because npm install not only attempts to install the packages specified in package.json, but it will also resolves the dependencies that these packages have. The resolution of these dependencies could introduce changes to the versions of those dependencies which is why the package-lock.json is updated after running npm install.

npm clean-install

To facilitate deterministic builds, especially when using CI, npm clean-install should be used. To use npm clean-install a project must have one of two files, package-lock.json or npm-shrinkwrap.json which is used to install the specific packages and their referenced versions without doing any package resolution. The difference between these two files is that unlike package-lock.json, npm-shrinkwrap.json can be included when publishing the package (I'm assuming they refer to publishing to NPMJS). The npm-shrinkwrap.json file is created by running npm shrinkwrap.

References:

https://docs.npmjs.com/cli/v9/commands/npm-install https://docs.npmjs.com/cli/v9/commands/npm-ci?v=true https://docs.npmjs.com/cli/v8/configuring-npm/package-lock-json https://docs.npmjs.com/cli/v8/configuring-npm/npm-shrinkwrap-json https://www.ariank.dev/be-aware-of-the-package-lock-json-and-npm-install/ https://levelup.gitconnected.com/npm-i-vs-npm-ci-install-node-modules-in-your-app-faster-and-wisely-e5b1bef0f93d https://support.deploybot.com/article/131-why-developers-should-use-npm-ci-instead-of-npm-install-and-its-benefits https://nexusinno.com/en/what-the-heck-are-package-locks-and-why-are-they-your-friends/

nutjob4life commented 1 year ago

Thanks @anilnatha.

FYI, my preference (and what the template repository uses) is to list dependencies in the package's setup.cfg file instead of maintaining a separate requirements.txt file. It helps ensure that the package's dependencies are installed when you do pip install my.package.

anilnatha commented 1 year ago

@nutjob4life that's a good point. for Python packages that are deployed to pypi, that would be best. I should have clarified that what I described is what I've used when deploying Python applications. Do you know if setup.cfg can also be used when deploying python apps?

nutjob4life commented 1 year ago

@anilnatha it sure can. When I'm developing a new app I do pip install --editable . to deploy the in-progress app to a virtual environment. Then once it's released, pip install package.name to pull it from PyPI and install it in a production environment.

anilnatha commented 1 year ago

This is good to know. My prior Python Apps weren't deployed to PyPi as they were used internal only and we managed them through GHE and some form of a CI process and we always relied on pip installs against some form of a requirements file. I'll look into this more when I have some time and let you know if I have any questions.

nutjob4life commented 1 year ago

@anilnatha I think it breaks down into two camps:

  1. Are you deploying some application that happens to be written in Python, uses existing dependencies, and maybe has some code that orchestrates the dependencies or otherwise serves as your main entrypoint? If so, using requirements.txt and pip install --requirement requirements.txt (or the "freeze" variations) is fine. You probably run .venv/bin/python main.py to get your code going.
  2. Are you making a reusable package—perhaps with a main entrypoint—or want people to be able to conda install or pip install your code? If so, using setup.cfg (or, since Python packaging is a rapidly-moving target, pyproject.toml) to declare dependencies is the right way to go. You probably run .venv/bin/some-console-script to get the code going.

â„–1 is convenient, but â„–2 requires more boilerplate.

(There's a third camp: that's for those "big" systems that want you to execute a command to set up your environment. These are things like Django or Wagtail where you might pip install a package but then run django startproject or wagtail start to create the appropriate directory layout to run a server.)

anilnatha commented 1 year ago

@nutjob4life That third camp you stated is where I have mostly worked. In those situations, we created the folder layout as you mentioned during development which was tracked through GHE. The rest was managed through docker image builds, CI/CD, etc to deploy to our dev/staging/prod venues.

tloubrieu-jpl commented 1 year ago

We should commit the package-lock.json file.

During development, we do npm install. When a PR of dependabot proposes to update the package-lock.json the developer needs to run himself npm install and update package.json to achieve the same result as the dependabot PR (the PR is advisory).

During the build process and Continuous Integration we should do npm clean install which uses the package-lock.json.

anilnatha commented 1 year ago

Wanted to add to @tloubrieu-jpl last comment. I've also been using npm clean-install when pulling down branches created by dependabot to test the changes it suggests.

nutjob4life commented 1 month ago

Hi folks I'm in between boards so I'm looking for things to work on. After re-reading our discussion thus far I wonder if we've reached a "status quo"; i.e., the practices we have already for NPM and Python are sufficient. Thoughts @anilnatha @tloubrieu-jpl @jordanpadams ?

tloubrieu-jpl commented 1 month ago

Hi @nutjob4life , I am not the expert on the topic, but what might be needed to close this ticket is:

Does that make any sense ?