Open melange396 opened 1 month ago
Good idea! Started thinking a little about this (using some tips from this blog), here's something for the hhs indicator to get started (chose it because I don't think it requires extra credentials). The lint command worked, the indicator started, but I didn't see it through to the end:
# covidcast-indicators/hhs_hosp/Dockerfile
FROM python:3.8.19-slim-bookworm
RUN mkdir /usr/src/app
WORKDIR /usr/src/app
RUN apt-get update
RUN apt-get install -y make git
COPY . /usr/src/app
RUN make install
# covidcast-indicators/hhs_hosp/.dockerignore
# Ignore bulky directories we bind-mount
cache
receiving
# Ignore local virtual environment
env
# EOF
# Commands to be run in the covidcast-indicators/hhs_hosp directory
docker build -f Dockerfile . -t delphi_hhs
docker run -it delphi_hhs make lint
docker run \
-it \
--mount type=bind,source="$(pwd)"/cache,target=/usr/src/app/cache \
--mount type=bind,source="$(pwd)"/receiving,target=/usr/src/app/receiving \
-e DELPHI_EPIDATA_KEY="$(echo $DELPHI_EPIDATA_KEY)" \
delphi_hhs env/bin/python -m delphi_hhs
Next steps might be something like:
cache
and receiving
directory; one possible snag here is that those directories are specified in the params.json
file for each indicator and on prod I think they point to a directory outside the repo, so this will complicate the bind mount recipe up above)So, I have maybe a dumb patch of an idea that can temporarily make sure that indicators have up to date environments:
A chronicle job that backs up the venv
folder, does make clean; make install
on staging for each indicator, makes sure that works, and then after 1-3 days does the same on Prod (enough time to cancel if it broke on staging). Run this like once a month or something.
I was mistaken, we do actually make use of venv
s for the indicators... I thought it was necessary to execute the activate
script to properly set up the environment, which we do not do in our scheduled job run ; in fact it is not required, and the way we invoke indicator jobs should take advantage of their respective virtual environments.
Using Jenkins (on a separate machine), we "build" environments and tar them up and then unzip those directory trees in the prod and staging machines. However, such environments are not intended to be moved, even to a different directory on the same machine. Perhaps it is good that we do not "activate
" the environments because there is path information from the build machine that is included in the script:
$ less ~indicators/runtime/nchs_mortality/env/bin/activate | grep nchs_mortality
VIRTUAL_ENV="/mnt/data/jenkins/workspace/covidcast-indicators_prod/nchs_mortality/env"
This approach has the "build once and then distribute" paradigm similar to Docker, but it unfortunately has these problems (and i am surprised we havent been bitten by them (yet?)).
After consulting w/ @korlaxxalrok , he thinks that (but dont quote me on this!) Jenkins can be made to build virtual environments on the prod/staging servers or Jenkins could build Docker images in a similar way instead. He also suggested that we could get GH actions to do it, but voiced concerns about secrets being leaked from there (unless we are careful to use methods to mask variables in the logs).
Following up to #1967, for consistency and reproducibility, we should "Dockerize" (or equivalent/similar) our indicator runtime environments. Though it will not be perfect, it will help us run the same code on different machines without having to worry about subtle differences in configurations or versioning of dependencies.
Our current installations essentially run on "bare hardware" (not even inside
venv
s, AFAICT) where different jobs may expect particular setups but instead actually be limited by each other's constraints. This will be a kind of paradigm shift in that our deployment processes and job scheduling/triggering will have to change.Somewhat related to https://github.com/cmu-delphi/delphi-epidata/issues/1389 .