Comment - Githubissues

The sql-script you are referring to (Bayes Lines Tool (BLT) - A SQL-script for analyzing diagnostic test results with an application to SARS-CoV-2-testing) has major flaws in the matching algorithm:

There were 28,757 tests with 3,829 positive results. The calculator yielded 23 possible solutions that matched the number of positive tests

Here is the reason why:

In a test campaign Tᵢ with nᵢ tests and mᵢ positive tests, the real prevalenceᵢ/sensitivityᵢ/specificityᵢ of Tᵢ is some product of k and 1/nᵢ (with 0 ≤ k ≤ n and k,n: non-negative integers) and is specific for this campaign (daily/mass testing result etc.) because every person (of all tested persons) is either true positiv or not (or false positive/false negativ or not).

If you choose a much larger (or "odd") step size in the algorithm like the 0,001-steps for the prevalence in the sql-script (sensitivity 0,005/specificity 0,005 respectively) compared to the step size 1/nᵢ of the test scenario (e.g. 1/28,757), you will miss a lot of valid matches because of rounding limitations or odd matching conditions ("the 99.99%" mentioned in the PDF).

Test 1:

100 tests with 20 positve cases; step size 0.01 (prevalence from 1% up to 50%/sensitivity 30% up to 100%/specificity from 80% up to 100%) rounding 1; valid matches: 681
99 tests with 20 positve cases; step size 0.01 (prevalence from 1% up to 50%/sensitivity 30% up to 100%/specificity from 80% up to 100%) rounding 1; valid matches: 697
101 tests with 20 positve cases; step size 0.01 (prevalence from 1% up to 50%/sensitivity 30% up to 100%/specificity from 80% up to 100%) rounding 1; valid matches: 674
123 tests with 20 positve cases; step size 0.01 (prevalence from 1% up to 50%/sensitivity 30% up to 100%/specificity from 80% up to 100%) rounding 1; valid matches: 551
123 tests with 20 positve cases; step size 1/123 (prevalence from ~1% up to ~50%/sensitivity ~30% up to 100%/specificity from ~80% up to 100%) rounding 1; valid matches: 736 (I hope that I properly removed all duplicate lines!)

Test 2 with the dutch numbers (28,757 tests with 3,829 positive results)

With a very small subset (201x11x11=24,231) – the number of permutations mentioned in the pdf is 17,945,000! – I get 27 valid matches alone compared to "23 possible solutions" in the original work.

Modified SQL code:

...
select
    (prevalence::numeric / 28757)::numeric as prevalence,
    (sensitivity::numeric / 28757)::numeric as sensitivity,
    (specificity::numeric / 28757)::numeric as specificity
from
    generate_series(3700, 3900, 1) as prevalence,
    generate_series(28737, 28747, 1) as sensitivity,
    generate_series(28737, 28747, 1) as specificity
),

matrices as
(
select 
    t.report_id,
    t.tests_performed,
    t.positives_reported,
    round(prevalence, 1) as prevalence, --just for cosmetic purposes
    round(sensitivity, 1) as sensitivity,
    round(specificity, 1) as specificity,
...

If I understand your Python code correctly, you made a similar error. IIRC your numbers aren't as odd as the original numbers because you used a wider error margin.

Dieser Matching-Algorithmus und die jämmerliche Interpretation der Daten, die er produziert, ist eine Schande für jeden Mathematiker oder Halb-Mathematiker, der an dem Papier beteiligt ist/war!

GoofyPy / Bayes-Lines-Tool

Comment #1