Sadly this (even still useful) is not enough. We need to also get "how many we should expect to have". Luckily we index also sequence_total as a value so the idea is:
Similar query, facet by sequence_total, get the "value" not the "count" from the extra data from the results. Sum them up. return both count and expected.
This is needed to be able to "avoid" reprocessing (as an option) ADOs where all matches/is in place. Even if the queue item workers will not run OCR again if present (we already have that) re-enqueing 700K OCRs just to check that is an overkill. Thinking of JSON patching etc.
See https://github.com/esmero/strawberryfield/blob/a7ba7330cc5f278e66533f40c75267ef8369f495/src/StrawberryfieldUtilityService.php#L319
Sadly this (even still useful) is not enough. We need to also get "how many we should expect to have". Luckily we index also
sequence_total
as a value so the idea is:Similar query, facet by
sequence_total
, get the "value" not the "count" from the extra data from the results. Sum them up. return both count and expected.This is needed to be able to "avoid" reprocessing (as an option) ADOs where all matches/is in place. Even if the queue item workers will not run OCR again if present (we already have that) re-enqueing 700K OCRs just to check that is an overkill. Thinking of JSON patching etc.