ebmdatalab / clinicaltrials-act-tracker

https://fdaaa.trialstracker.net/
MIT License
16 stars 2 forks source link

Tidy-up data update deployment scripts #245

Open madwort opened 3 years ago

madwort commented 3 years ago

Cronjobs

The production site uses the db clinicaltrials. ~There is a daily cronjob that downloads XML files from upstream & converts them to CSV. There is a second daily cronjob~ There is a cronjob that imports fresh data into the staging site using the db clinicaltrials_staging - if the data in staging looks good then @NickCEBM will use fabric to manually copy the data from staging to production.

The import-into-staging cronjob is currently not the cronjob that the fabfile suggests but is in fact fdaaa:

00 04 * * 1-5 seb /usr/local/bin/fdaaa_temp.sh

fdaaa_temp.sh contains:

#!/bin/bash

set -e

. /etc/profile.d/fdaaa_staging.sh
GOOGLE_SERVICE_ACCOUNT_FILE=/home/seb/clinicaltrials-credentials.json GOOGLE_SERVICE_ACCOUNT_FILE_FDAAA=$GOOGLE_SERVICE_ACCOUNT_FILE /var/www/fdaaa_staging/venv/bin/python /var/www/fdaaa_staging/clinicaltrials-act-tracker/clinicaltrials/manage.py load_data staging-fdaaa.ebmdatalab.net

I think any output from this script will be in the mailbox of the seb user on the server:

sudo su - seb
mail

although the logs from when this import was broken (9-20 July 2021) don't show any errors.

Manual updates

The production deployment is currently on commit a6e4e1241c2ae25e9c5bd3ab88cda90731a2008b (master), and staging is on commit 6b063216bab4a0165a5c4a1349716d616358e904 (gae-ise). The gae-ise branch is out of date, and the fabfile is now out of date, however the fabfile from master works, I can run it as documented in the readme from my dev environment if I have an updated python package :

pip3 install --upgrade snowflake-connector-python

UPDATE it does not manually run from fabric, it throws this error into /tmp/fdaaa_staging_*_data_load.out

usage: manage.py load_data [-h] [--version] [-v {0,1,2,3}]
                           [--settings SETTINGS] [--pythonpath PYTHONPATH]
                           [--traceback] [--no-color]
                           callback_host
manage.py load_data: error: the following arguments are required: callback_host
madwort commented 2 years ago

There is a daily cronjob that downloads XML files from upstream & converts them to CSV. There is a second daily cronjob that imports fresh data into the staging site using the db clinicaltrials_staging...

seems like this understanding was wrong - there's just the one cronjob that does both of these jobs in one https://github.com/ebmdatalab/clinicaltrials-act-tracker/blob/801cacaf2830d7acc853e44eb4382f63a40fadc1/clinicaltrials/frontend/management/commands/load_data.py#L215-L219

madwort commented 2 years ago

I did indeed find useful debugging information in emails for the seb user on smallweb1 for this issue