Automate all public signals

krivard commented 4 years ago

Entails moving code to produce each into the covidcast-indicators repository. NB doctor-visits and fb-survey have restricted-access data sources, so for them we first need a way to test automation code on safe data.

Code ingestion:

[x] doctor-visits
[x] fb-survey
[x] ght
[x] jhu-csse
[ ] indicator-combination

Verify new code produces same results as old code:

[x] doctor-visits
[x] fb-survey
[x] ght
[x] jhu-csse
[ ] indicator-combination

Automation:

[ ] doctor-visits
[x] #1623
[ ] fb-survey
[x] ght
[x] jhu-csse
[x] usa-facts
[ ] indicator-combination
[ ] combination deaths signal

krivard commented 4 years ago

Quidel and Google Surveys are removed from the map; I've moved them to a separate issue #16 since they don't have to be done for this milestone.

statsmaths commented 4 years ago

Pulled code for the jhu cases into the main branch with db6a7d7. Working on new google-trends branch for the ght signals.

korlaxxalrok commented 4 years ago

Here is what automation for the indicators signals in covidcast-indicators looks like, @krivard / @capnrefsmmat. Happy to hear thoughts n'at. Probably should go through this via Zoom as well because my description here will definitely be lacking.

The general state

From a lofty view, the process (taking jhu-csse as the example) is:

A dev-type does some work on a branch, following @statsmaths recommended framework for building and testing.
Assuming all goes well they submit a PR against deploy-jhu (branch is protected).
Code is reviewed and merged.
Jenkins detects a merge and starts its pipeline phases of:
- Build - Sets up the venv in a local workspace on the Jenkins server and installs all dependencies as described in the indicators repo. Basically builds a dev environment on the Jenkins server.
- Test - Runs tests as described in the indicators repo.
- Package - Creates a .tar.gz of the venv and stashes it on the system at /home/indicators/runtime/${package}
- Deploy - Ansible (pre-configured) on the Jenkins system is then invoked and copies and unpacks the new build on the Delphi primary filesystem. Only changed files are modified.
- Ansible will also handle a couple of extra tasks like placing a production params.json file so we can specify the receiving directory for the .csvs, as well as modifying the Python path in the venv directory.
- The Jenkins pipeline is essentially driven by shell scripts that are set to fail pretty hard if there are any issues so we can track down errors in the Jenkins console if necessary.
If all goes well, we get a simple notification with a link to the Jenkins build URL in Slack jenkins-ci. We also get failures in the same way so we can track down issues. Longer term this can be expanded a bit to make it more useful.

A successful Jenkins pipeline:

A successful Jenkins Slack message:

On the receiving end, Delphi primary has a new user of indicators and related home directory where the indicators packages will be stashed. Once placed and properly configured with any changes in params.json, we can then simply drive the pipeline with something like this in Automation (the automation user has been granted the ability to become the indicators user to invoke the pipelines):

sudo -u indicators -s bash -c "cd /home/indicators/runtime/jhu && env/bin/python -m delphi_jhu"

The above works manually, but I need to figure out why the scheduled version is not triggering.

Some thoughts on workflow

We will need a separate deployment branch for each indicator. This allows us to run a separate Jenkins pipeline for each indicator so that we may package and deploy each one discreetly. This may necessitate some workflow changes

krivard commented 4 years ago

Love it. Can we configure Ansible to deploy a per-indicator params.json or are we assuming one params to rule them all? I'm thinking of the need to separate a dry-run mode from fully activated mode.

For workflow, we should discuss whether it makes more sense to do our code review PRs into main or into the appropriate deploy-* branch. We'll want to keep both up to date, especially given that we expect to share some code and static resources across indicators. Depending on how the deploy-* branches are configured we may also need to learn to be much more disciplined about semantic commits and linear history and the like.

korlaxxalrok commented 4 years ago

Yep, the assumption is that we'll need to send a unique params.json per indicator. Should be good to go there. We just need to create a prod version of the file and let Ansible replace the local dev version when it puts the bundle onto Delphi primary.

Re workflow, for now we could do PRs on the deploy branch, and then merge to main after deployment. We might even look at a Github action to automate this since it is an extra hop for devs. This way main can still be the source of truth for new branches. The Jenkins config is a little naive at this point but should be plenty configurable for future needs (maybe we do something when the PR is created vs. on the merge, etc.).

korlaxxalrok commented 4 years ago

I made some progress on the PR-based workflow, just need to figure out how to trigger on the PR merge now and we will nearly be cooking with 🔥

PR-based workflow:

Dev cuts a branch from some appropriate source (main seems like the best candidate so as to keep things organized, though this is up for discussion still and also possibly subject to what we actually end up doing workflow-wise).
Dev makes changes and pushes the branch to origin.
Jenkins (mostly) ignores this, but does give us a small hat wobble in jenkins-ci to let us know that a new branch was pushed to origin. I think we can remove this if it is deemed noisy, or keep it and make it more useful over time.
Dev creates a PR (I've only tested with `git pull-request -b jhu-deploy so far) and the build/test/package stages of the pipeline are triggered.
WIP:
- How to trigger just the deploy stage on the merge action. I have a feeling it is right under my nose but I haven't spotted it yet. Working on that next.

korlaxxalrok commented 4 years ago

Ok, I think the Jenkins stuff is figured out:

PRs against a deploy-* branch will trigger the build/test/package stages
- Subsequent pushes to the PR branch will keep triggering this sequence (in the event that changes need to be made before the PR is accepted and merged)
A merge then triggers the deploy stage

It is pretty simple and mostly (aside from the initial setup in Jenkins) is defined in the Jenkinsfile in the repo.

Caveat: At the moment, each indicator will need corresponding build/test/package/deploy shell scripts created. This could be more efficient, but is necessary at the moment due to an issue with not being able to successfully retrieve some Jenkins environment info that would otherwise reduce some toil. It isn't a huge effort, just some possible unnecessary duplication at the moment. The upside is that each stage of build/test/package/deploy could be handled very specifically if necessary.

A thing to think about: The params files. I'm not doing the right thing with them, as I just realized they are not supposed to be committed.

At the moment Ansible is copying a "prod" version of the params file into the indicator's directory at deploy time.

I think we could stick with this process, and additionally use Ansible's vault encryption/decryption mechanism. That way we could keep the prod version of the params file in the reopo in encrypted form. It would be a simple way to move forward for now. Longer term, we could look at other options.

krivard commented 4 years ago

What private information is in params.json for currently available signals?

[ ] JHU - none
[ ] USA Facts - none
[ ] GHT - API key
[ ] doctor-visits - ??
[ ] emr-hospitalizations - ??
[ ] fb-survey - currently doesn't include the process that would require the API key, but ultimately will
[ ] cdc-covidnet - none
[ ] safegraph - API key

korlaxxalrok commented 4 years ago

@krivard New PR for hopefully the final stages of the new Indicators automation workflow. Initially, some review of the new readme will be welcome, though there is potentially a lot of new stuff in the repo (Jenkins shell scripts, Jenkinsfile, Ansible stuff), so we can certainly talk about any of that more specifically.

Once this looks good, the final tasks will be:

Merge in most up to date working run-jhu branch.
Merge PR to deploy-jhu
Schedule Automation to trigger the indicator
Once deployed we should be able to purge the cache directory contents from the repo. As I understand it that is being maintained so that we can run it locally across different users machines. Once deployed and automated it will live on Delphi primary.

Does this sounds generally ok as a way to move forward?

krivard commented 4 years ago

Yep, that all sounds reasonable. We should add the following verification cycle:

Schedule Automation to run the indicator but not submit the resulting csvs to receiving
Manually check Automation csvs against manually-generated csvs
If they don't match, debug and repeat
Schedule Automation to run the indicator and begin submitting csvs to receiving
purge cache contents from the repo

korlaxxalrok commented 4 years ago

@krivard USA Facts is set to run on auto today at 1pm. Next up will be GHT.

korlaxxalrok commented 4 years ago

@krivard There are failing tests with the ght pipeline in Jenkins so CI/CD and Automation work is paused until we can:

a) have someone take a look and fix (Addison is swamped)
b) decide that the failures are not important and ignore them (maybe they always failed and we simply carried on?)

I've validated that these fail for me locally.

Short info:

========================================================= short test summary info ==========================================================
FAILED test_pull_api.py::TestGoogleHealthTrends::test_class_state - AssertionError: assert ['value', 'date'] == ['date', 'value']
FAILED test_pull_api.py::TestGoogleHealthTrends::test_class_dma - AssertionError: assert ['value', 'date'] == ['date', 'value']

krivard commented 4 years ago

@vishakha1812 to take a look

korlaxxalrok commented 4 years ago

@krivard @vishakha1812 This issue is for ght, and I think we may have called it out for safegraph instead, so just clarifying that here.

cmu-delphi / covidcast-indicators

Automate all public signals #5

The general state

Some thoughts on workflow