COG-UK / dipi-group

Data integrity and pipeline integration working group
4 stars 1 forks source link

[manualpipe] Asklepian 20220224 / 22k DHSC genomes missing from Asklepian #192

Closed SamStudio8 closed 2 years ago

SamStudio8 commented 2 years ago

UKHSA genomics cell reporting the EDGE line lists are missing a considerable number of private provider genomes.

SamStudio8 commented 2 years ago

I've investigated this and know the root cause. There was a minor change to code related to publishing DHSC genomes for GISAID; it was requested we do not send genomes without appropriate credit to private providers. We rolled this change out yesterday after testing the GISAID interface without realising Asklepian was using the "publishing" endpoint to do its business. I'll be making a fix to Majora ASAP and will restart Asklepian to ensure the missing genomes are returned today.

SamStudio8 commented 2 years ago

The change prevented genomes being emitted from the get_pag_v2 celery task if the appropriate credit field had not been filled in. The easiest fix for now is probably:

Whilst being careful not to accidentally send the genomes to GISAID in the mean time.

SamStudio8 commented 2 years ago

@BioWilko Let me know when today's GISAID pipe has finished its ocarina-get-gisaid step.

SamStudio8 commented 2 years ago

Asklepian 20220224 process has been stopped pending a control restart.

SamStudio8 commented 2 years ago

tael_asklepian listener back up

SamStudio8 commented 2 years ago

GISAID pipes no longer run over the weekend, so actually the easiest thing to do here is unset the credit_code_only opt in on the DHSC org and propose a patch to be merged Monday; rather than rushing a solution today that might blow up over the weekend.

I'm leaning towards removing this behaviour from get_pag_v2 entirely, as filtering inside the task is clearly unexpected behaviour. Preventing submissions to GISAID belongs in the GISAID pipeline.

SamStudio8 commented 2 years ago
SamStudio8 commented 2 years ago

@BioWilko confirms gisaid step has moved past ocarina checkpoint

SamStudio8 commented 2 years ago

I think there is actually a much simpler all-around fix here that satisfies not filtering in get_pag_v2 and leverages the auto_submit script's use of --ffield-true. We are testing this patch now.

SamStudio8 commented 2 years ago
SamStudio8 commented 2 years ago

I've updated the supplier GISAID username and @BioWilko has kicked both the tael_gisaid and tael_asklepian services.

SamStudio8 commented 2 years ago

GISAID manifest matches expectations. Waiting on Asklepian manifest (which takes longer) to finish for inspection.

SamStudio8 commented 2 years ago

@BioWilko confirms the GISAID submissions are flowing through with the correct username now, so we anticipate those will be accepted by GISIAD over the next 24 hours or so. I've spotted the Asklepian manifest coming down and am looking for the missing genomes to confirm they are not filtered from get_pag_v2.

SamStudio8 commented 2 years ago

Test IDs provided by PHE are all in the manifest now. 21409 missing records restored, Asklep should proceed nominally

SamStudio8 commented 2 years ago

Patched by https://github.com/SamStudio8/majora2/commit/b95a6b85d0cea8f80fa25f8f0806293bbd162102 and will not require tael_gisaid to be shut down.

SamStudio8 commented 2 years ago

@BioWilko can you monitor this and close up shop

BioWilko commented 2 years ago

Asklepian = finished. :)