datamade / scrapers-us-municipal

Scrapers for US municipal governments.
MIT License
10 stars 8 forks source link

Specify --upgrade in Dockerfile so Git requirements are updated #60

Closed hancush closed 3 years ago

hancush commented 3 years ago

Description

When installing from a repository, specifically a branch, pip won't retrieve updates to VCS requirements "if a satisfactory version of the package is already installed". References: https://github.com/pypa/pip/issues/2837, https://pip.pypa.io/en/latest/reference/pip_install/#vcs-support. The effect is that the latest versions of the dependencies we install from branches are captured in the first but not subsequent image builds. This PR adds the --upgrade flag to dependency installation to rectify this.

Notes

We haven't released python-legistar-scraper on PyPI, so we can't install from a version. I, for one, would like to avoid having to update scraper requirements with each change, i.e., would prefer not to pin to a particular commit.

On the other hand, some of our dependencies (lxml, sh, requests) are not pinned to a particular version. @fgregg As our package management expert, do you see risks to adding the --upgrade flag in this way? Perhaps I should run a second pip command specifically for the Legistar lib? Happy for your thoughts!

fgregg commented 3 years ago

hmm... i think i'm a bit confused about the how this interacts with the docker's caching.

if we don't change the requirements.txt file, then it seems like the layer caching won't be invalidated and we'll just skip the pip install line.

is that so?

hancush commented 3 years ago

@fgregg Very good question. Looked into it. For local builds, you're right (unless we specify the --no-cache option). For DockerHub, we can configure builds not to use the build cache: https://docs.docker.com/docker-hub/builds/#configure-automated-build-settings

fgregg commented 3 years ago

if we are not using the build cache, is that the same as rebuilding from scratch from a time perspective? if so, rebuilding from scratch will make this do the right for the python-legistar dependency i think?

On Wed, Jan 13, 2021 at 10:25 AM hannah cushman garland < notifications@github.com> wrote:

@fgregg https://github.com/fgregg Very good question. Looked into it. For local builds, you're right (unless we specify the --no-cache option). For DockerHub, we can configure builds not to use the build cache: https://docs.docker.com/docker-hub/builds/#configure-automated-build-settings

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/datamade/scrapers-us-municipal/pull/60#issuecomment-759520839, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEDC3PKNPMC5ULXNAPNU5TSZW3PDANCNFSM4WA42QTQ .

hancush commented 3 years ago

@fgregg Great thought! Disabled build caching and rebuilt our images, then used the production image to run a scrape locally. The output matches my expectation and the new behavior of the Legistar scraping lib. Closing.