Upgrade debian 9 to debian 10 for gae-ise workflow

ebmdatalab / clinicaltrials-act-converter

1 stars 0 forks source link

Upgrade debian 9 to debian 10 for gae-ise workflow #15

Closed madwort closed 1 year ago

madwort commented 1 year ago

At the time of writing, the data is imported into the staging deployment on smallweb1 using the gae-ise branch. This uses a debian 9 image, which has been discontinued. Upgrade this to debian 10 and fix any resulting issues.

madwort commented 1 year ago

This will update the version of python from 3.5 to 3.7, so it has the potential to break things.

madwort commented 1 year ago

Runtime logs can be viewed by accessing google compute engine - vm instances - ctgov-converter instance - cloud logging

madwort commented 1 year ago

appears to do much of the pipeline, including uploading a new copy of clinical_trials.csv, but then this step returns a 504 timeout

startup-script: Running webhook https://staging-fdaaa.ebmdatalab.net/management/process_data/?secret=...&input_csv=https://storage.googleapis.com/ebmdatalab/clinicaltrials/clinical_trials.csv

(I think you can hit this link manually - if you have the secret - to reproduce the error)

madwort commented 1 year ago

In /var/log/nginx/fdaaa_staging.error.log we can also see the timeout:

2022/08/10 16:09:53 [error] 28696#28696: *73788 upstream timed out (110: Connection timed out) while reading response header from upstream, client: ..., server: staging-fdaaa.ebmdatalab.net, request: "GET /management/process_data/?secret=...&input_csv=https://storage.googleapis.com/ebmdatalab/clinicaltrials/clinical_trials.csv HTTP/1.1", upstream: "http://unix:/tmp/gunicorn-fdaaa_staging.sock/management/process_data/?secret=...&input_csv=https://storage.googleapis.com/ebmdatalab/clinicaltrials/clinical_trials.csv", host: "staging-fdaaa.ebmdatalab.net"

which in nginx timing out whilst waiting for gunicorn - I think we can fix this by increasing the request timeout in the nginx config.

madwort commented 1 year ago

Nb. we can also verify the gunicorn config which has an extra-long timeout here https://github.com/ebmdatalab/clinicaltrials-act-tracker/blob/gae-ise/deploy/gunicorn-fdaaa_staging.conf.py#L3

madwort commented 1 year ago

increased the nginx timeout to 300seconds, now getting a CloudFlare timeout ( Error 524 ) from the server. Removing CloudFlare proxy & re-trying.

madwort commented 1 year ago

I have a suspicion that - minutes after the requests timed out - the gunicorn workers are still running on their 100min timeouts. I thenefore suspect that this might complete successfully at some point in the next hour or two, and that the warning is actually how this system has been functioning for the last couple of years. Will review tomorrow... Re-enabling the CF proxy. e.g. there are logs from 2022-07-12 showing the same error timeout.

UPDATE: the import process appears to have completed successfully!

ccunningham101 commented 1 year ago

The changes fix the run on the server, but the CI tests (have been) and are failing #18 , will investigate at a later date