Dockerize indicator runners

melange396 commented 1 month ago

Following up to #1967, for consistency and reproducibility, we should "Dockerize" (or equivalent/similar) our indicator runtime environments. Though it will not be perfect, it will help us run the same code on different machines without having to worry about subtle differences in configurations or versioning of dependencies.

Our current installations essentially run on "bare hardware" (not even inside venvs, AFAICT) where different jobs may expect particular setups but instead actually be limited by each other's constraints. This will be a kind of paradigm shift in that our deployment processes and job scheduling/triggering will have to change.

dshemetov commented 1 month ago

Good idea! Started thinking a little about this (using some tips from this blog), here's something for the hhs indicator to get started (chose it because I don't think it requires extra credentials). The lint command worked, the indicator started, but I didn't see it through to the end:

# covidcast-indicators/hhs_hosp/Dockerfile
FROM python:3.8.19-slim-bookworm

RUN mkdir /usr/src/app
WORKDIR /usr/src/app

RUN apt-get update
RUN apt-get install -y make git

COPY . /usr/src/app

RUN make install

# covidcast-indicators/hhs_hosp/.dockerignore
# Ignore bulky directories we bind-mount
cache
receiving
# Ignore local virtual environment
env
# EOF

# Commands to be run in the covidcast-indicators/hhs_hosp directory
docker build -f Dockerfile . -t delphi_hhs
docker run -it delphi_hhs make lint
docker run \
  -it \
  --mount type=bind,source="$(pwd)"/cache,target=/usr/src/app/cache \
  --mount type=bind,source="$(pwd)"/receiving,target=/usr/src/app/receiving \
  -e DELPHI_EPIDATA_KEY="$(echo $DELPHI_EPIDATA_KEY)" \
  delphi_hhs env/bin/python -m delphi_hhs

Next steps might be something like:

make an analogue for an indicator that's pulling current data (hhs is not, for now)
get it going on staging and compare outputs (the two main ones being cache and receiving directory; one possible snag here is that those directories are specified in the params.json file for each indicator and on prod I think they point to a directory outside the repo, so this will complicate the bind mount recipe up above)
make sure logging in the container hooks into our logging infra correctly
figure out other things that need to match and get them to match (maybe deploy repo type of stuff?)

dsweber2 commented 1 month ago

So, I have maybe a dumb patch of an idea that can temporarily make sure that indicators have up to date environments:

A chronicle job that backs up the venv folder, does make clean; make install on staging for each indicator, makes sure that works, and then after 1-3 days does the same on Prod (enough time to cancel if it broke on staging). Run this like once a month or something.

melange396 commented 1 month ago

I was mistaken, we do actually make use of venvs for the indicators... I thought it was necessary to execute the activate script to properly set up the environment, which we do not do in our scheduled job run ; in fact it is not required, and the way we invoke indicator jobs should take advantage of their respective virtual environments.

Using Jenkins (on a separate machine), we "build" environments and tar them up and then unzip those directory trees in the prod and staging machines. However, such environments are not intended to be moved, even to a different directory on the same machine. Perhaps it is good that we do not "activate" the environments because there is path information from the build machine that is included in the script:

$ less ~indicators/runtime/nchs_mortality/env/bin/activate | grep nchs_mortality
VIRTUAL_ENV="/mnt/data/jenkins/workspace/covidcast-indicators_prod/nchs_mortality/env"

This approach has the "build once and then distribute" paradigm similar to Docker, but it unfortunately has these problems (and i am surprised we havent been bitten by them (yet?)).

After consulting w/ @korlaxxalrok , he thinks that (but dont quote me on this!) Jenkins can be made to build virtual environments on the prod/staging servers or Jenkins could build Docker images in a similar way instead. He also suggested that we could get GH actions to do it, but voiced concerns about secrets being leaked from there (unless we are careful to use methods to mask variables in the logs).

cmu-delphi / covidcast-indicators

Dockerize indicator runners #1968