Closed madwort closed 1 year ago
This will update the version of python from 3.5 to 3.7, so it has the potential to break things.
Runtime logs can be viewed by accessing google compute engine - vm instances - ctgov-converter
instance - cloud logging
appears to do much of the pipeline, including uploading a new copy of clinical_trials.csv
, but then this step returns a 504 timeout
startup-script: Running webhook https://staging-fdaaa.ebmdatalab.net/management/process_data/?secret=...&input_csv=https://storage.googleapis.com/ebmdatalab/clinicaltrials/clinical_trials.csv
(I think you can hit this link manually - if you have the secret - to reproduce the error)
In /var/log/nginx/fdaaa_staging.error.log
we can also see the timeout:
2022/08/10 16:09:53 [error] 28696#28696: *73788 upstream timed out (110: Connection timed out) while reading response header from upstream, client: ..., server: staging-fdaaa.ebmdatalab.net, request: "GET /management/process_data/?secret=...&input_csv=https://storage.googleapis.com/ebmdatalab/clinicaltrials/clinical_trials.csv HTTP/1.1", upstream: "http://unix:/tmp/gunicorn-fdaaa_staging.sock/management/process_data/?secret=...&input_csv=https://storage.googleapis.com/ebmdatalab/clinicaltrials/clinical_trials.csv", host: "staging-fdaaa.ebmdatalab.net"
which in nginx timing out whilst waiting for gunicorn - I think we can fix this by increasing the request timeout in the nginx config.
Nb. we can also verify the gunicorn config which has an extra-long timeout here https://github.com/ebmdatalab/clinicaltrials-act-tracker/blob/gae-ise/deploy/gunicorn-fdaaa_staging.conf.py#L3
increased the nginx timeout to 300seconds, now getting a CloudFlare timeout ( Error 524 ) from the server. Removing CloudFlare proxy & re-trying.
I have a suspicion that - minutes after the requests timed out - the gunicorn workers are still running on their 100min timeouts. I thenefore suspect that this might complete successfully at some point in the next hour or two, and that the warning is actually how this system has been functioning for the last couple of years. Will review tomorrow... Re-enabling the CF proxy. e.g. there are logs from 2022-07-12 showing the same error timeout.
UPDATE: the import process appears to have completed successfully!
The changes fix the run on the server, but the CI tests (have been) and are failing #18 , will investigate at a later date
At the time of writing, the data is imported into the staging deployment on smallweb1 using the gae-ise branch. This uses a debian 9 image, which has been discontinued. Upgrade this to debian 10 and fix any resulting issues.