Open krivard opened 4 years ago
Quidel and Google Surveys are removed from the map; I've moved them to a separate issue #16 since they don't have to be done for this milestone.
Pulled code for the jhu cases into the main branch with db6a7d7. Working on new google-trends
branch for the ght signals.
Here is what automation for the indicators signals in covidcast-indicators
looks like, @krivard / @capnrefsmmat. Happy to hear thoughts n'at. Probably should go through this via Zoom as well because my description here will definitely be lacking.
From a lofty view, the process (taking jhu-csse as the example) is:
deploy-jhu
(branch is protected)./home/indicators/runtime/${package}
params.json
file so we can specify the receiving directory for the .csvs, as well as modifying the Python path in the venv directory.jenkins-ci
. We also get failures in the same way so we can track down issues. Longer term this can be expanded a bit to make it more useful.A successful Jenkins pipeline:
A successful Jenkins Slack message:
On the receiving end, Delphi primary has a new user of indicators
and related home directory where the indicators packages will be stashed. Once placed and properly configured with any changes in params.json
, we can then simply drive the pipeline with something like this in Automation (the automation
user has been granted the ability to become the indicators
user to invoke the pipelines):
sudo -u indicators -s bash -c "cd /home/indicators/runtime/jhu && env/bin/python -m delphi_jhu"
The above works manually, but I need to figure out why the scheduled version is not triggering.
We will need a separate deployment branch for each indicator. This allows us to run a separate Jenkins pipeline for each indicator so that we may package and deploy each one discreetly. This may necessitate some workflow changes
Love it. Can we configure Ansible to deploy a per-indicator params.json
or are we assuming one params
to rule them all? I'm thinking of the need to separate a dry-run mode from fully activated mode.
For workflow, we should discuss whether it makes more sense to do our code review PRs into main
or into the appropriate deploy-*
branch. We'll want to keep both up to date, especially given that we expect to share some code and static resources across indicators. Depending on how the deploy-*
branches are configured we may also need to learn to be much more disciplined about semantic commits and linear history and the like.
Yep, the assumption is that we'll need to send a unique params.json
per indicator. Should be good to go there. We just need to create a prod version of the file and let Ansible replace the local dev version when it puts the bundle onto Delphi primary.
Re workflow, for now we could do PRs on the deploy branch, and then merge to main
after deployment. We might even look at a Github action to automate this since it is an extra hop for devs. This way main
can still be the source of truth for new branches. The Jenkins config is a little naive at this point but should be plenty configurable for future needs (maybe we do something when the PR is created vs. on the merge, etc.).
I made some progress on the PR-based workflow, just need to figure out how to trigger on the PR merge now and we will nearly be cooking with 🔥
PR-based workflow:
main
seems like the best candidate so as to keep things organized, though this is up for discussion still and also possibly subject to what we actually end up doing workflow-wise).jenkins-ci
to let us know that a new branch was pushed to origin. I think we can remove this if it is deemed noisy, or keep it and make it more useful over time.Ok, I think the Jenkins stuff is figured out:
deploy-*
branch will trigger the build/test/package stages
It is pretty simple and mostly (aside from the initial setup in Jenkins) is defined in the Jenkinsfile
in the repo.
Caveat: At the moment, each indicator will need corresponding build/test/package/deploy shell scripts created. This could be more efficient, but is necessary at the moment due to an issue with not being able to successfully retrieve some Jenkins environment info that would otherwise reduce some toil. It isn't a huge effort, just some possible unnecessary duplication at the moment. The upside is that each stage of build/test/package/deploy could be handled very specifically if necessary.
A thing to think about: The params files. I'm not doing the right thing with them, as I just realized they are not supposed to be committed.
At the moment Ansible is copying a "prod" version of the params file into the indicator's directory at deploy time.
I think we could stick with this process, and additionally use Ansible's vault
encryption/decryption mechanism. That way we could keep the prod version of the params file in the reopo in encrypted form. It would be a simple way to move forward for now. Longer term, we could look at other options.
What private information is in params.json for currently available signals?
@krivard New PR for hopefully the final stages of the new Indicators automation workflow. Initially, some review of the new readme will be welcome, though there is potentially a lot of new stuff in the repo (Jenkins shell scripts, Jenkinsfile, Ansible stuff), so we can certainly talk about any of that more specifically.
Once this looks good, the final tasks will be:
run-jhu
branch.deploy-jhu
cache
directory contents from the repo. As I understand it that is being maintained so that we can run it locally across different users machines. Once deployed and automated it will live on Delphi primary.Does this sounds generally ok as a way to move forward?
Yep, that all sounds reasonable. We should add the following verification cycle:
cache
contents from the repo@krivard USA Facts is set to run on auto today at 1pm. Next up will be GHT.
@krivard There are failing tests with the ght
pipeline in Jenkins so CI/CD and Automation work is paused until we can:
I've validated that these fail for me locally.
Short info:
========================================================= short test summary info ==========================================================
FAILED test_pull_api.py::TestGoogleHealthTrends::test_class_state - AssertionError: assert ['value', 'date'] == ['date', 'value']
FAILED test_pull_api.py::TestGoogleHealthTrends::test_class_dma - AssertionError: assert ['value', 'date'] == ['date', 'value']
@vishakha1812 to take a look
@krivard @vishakha1812 This issue is for ght
, and I think we may have called it out for safegraph
instead, so just clarifying that here.
Entails moving code to produce each into the covidcast-indicators repository. NB doctor-visits and fb-survey have restricted-access data sources, so for them we first need a way to test automation code on safe data.
Code ingestion:
Verify new code produces same results as old code:
Automation: