Solve data loading for Cloud Foundry

konklone commented 9 years ago

This does a few things!

Deploying the app with a script instead of a Procfile

It changes the deploy command to use a script, cf.sh, instead of the Procfile. I modeled cf.sh on @kaitlin's initdb.sh script for Hourglass. The launch command for the app is now:

cf push foia -c "bash cf.sh"

Personally, I find that janky and un-ideal. But Cloud Foundry doesn't have real task running support like Heroku does (heroku run). This is simpler in some ways, though, since if we were to use a cf run-like command, we'd have to make some kind of script to run that for us after deploys. This faces that head-on and has Cloud Foundry do the script-running for us.

Loading script now clones the data itself

The load_agency_contacts script now defaults to cloning the data from https://github.com/18f/foia.git into a directory named temp-data in the project root, and loads in the YAMLs from there. The repo location is defined in source control, in the base settings file.

If the temp-data directory is already there, it will be deleted first and replaced clean. (No git pull.) temp-data has been ignored in .gitignore and .cfignore.

I tested this out by temporarily deploying a version of the app that pulled from a personal fork of the data, and confirmed that if I changed some values in the master branch of the repo, and then deployed the app, the new version of the app displayed the changes correctly:

testing-1

We have to live without requirements-prod.txt

Without modifications to the Cloud Foundry buildpack we use, we have to merge requirements-prod.txt back into requirements.txt, which this PR does. CF is currently configured to only install things in requirements.txt, and so we were having busted deploys when waitress couldn't be found, and so the app didn't even get a chance to properly start.

There's a secondary issue, which is that when deploying in that situation, the logs were very uninformative. That's been triaged and reported to DevOps, and it's in their non-blocker sprint candidates.

Further work

Doing multiple deploys revealed that we have serious downtime between deploys. We should really adopt a zero-downtime strategy, or else we'll be left deploying at night or something. I opened a ticket, with ideas/resources for that, in #602.
The README needs simplification, and some clear CF instructions, especially with the new command. I'm working on that in a separate branch.

One more note: in our previous system, when we deployed new data, we assumed it would make its way onto the site before much longer. Now, no contact data updates will appear on the site without a deploy. However, I would argue that the new configuration is better than the previous way, because it ensures that migrations will always be run just prior to data loading.

Fixes #511.

kaitlin commented 9 years ago

@dlapiduz pointed me to this: https://github.com/yuvadm/heroku-periodical

Which can be run as a separate app in the same space to handle scheduled tasks. I haven't tried it out yet but I will be when I re-deploy Discovery into CF.

khandelwal commented 9 years ago

There should probably be some tests associated with this pull request.

konklone commented 9 years ago

OK, I think this one is merge-ready.

konklone commented 9 years ago

Can I get a merge :hammer:?

18F / 2015-foia-hub