NTD-Modelling-Consortium / ntd-model-sch

NTD SCH model
MIT License
0 stars 6 forks source link

correct the number of samples used for getBurdens #79

Open mattg3004 opened 5 days ago

mattg3004 commented 5 days ago

the nSamples used within the getBurdens was hard coded as 1, rather then taking the entry from the inputs. This is edited to take the value from the inputs.

This function will be used to generate the IHME and prevalence outputs, so we need to have this included in the code to output these data. The fits which have been done didn't rely on this function, so there wasn't an issue with the outputs from the model.

thk123 commented 5 days ago

I tried to write a test:

import numpy as np
import numpy.testing as npt
import sch_simulation
from sch_simulation.helsim_FUNC_KK.helsim_structures import Parameters, ProcResult
import sch_simulation.helsim_FUNC_KK.results_processing as results_processing
import sch_simulation.helsim_RUN_KK

def test_get_burdens_with_non_default_sample_size():
    np.random.seed(10)
    hostData = [
        ProcResult(
            vaccState=np.full((1000,4), 1),
            wormsOverTime=np.full((1000,4), 20),
            ages=np.full((1000,4), 10),
            timePoints=range(4),
            prevalence=np.zeros(1000),
            femaleWormsOverTime=np.full((1000,4), 10)
        )
    ]
    parameter_file_name = "mansoni_params.txt"
    params = sch_simulation.helsim_RUN_KK.loadParameters(
        parameter_file_name, "UgandaRural"
    )
    # Low test sensitivity so that repeated samples increases prevalence
    params.testSensitivity = 0.001
    prevalence, _, _, _, _ =results_processing.getBurdens(hostData, params, 1, np.array([5, 95]), False, "KK1", nSamples=1)
    npt.assert_array_equal(prevalence, np.array([0.027, 0.03 , 0.031, 0.034]))

But these prevalence values are not adjusted regardless what nSamples I provided - looking through I found getSetOfEggCounts does not use nSamples, though the result is divided by them - I don't know if that is important. I suspect the real issue is my setting up of the parameters resulting in no eggs being produced - perhaps you can see how to tweak the values to ensure that the nSamples has an impact.

thk123 commented 5 days ago

The values do change if switching from "kk1" to "kk2" and if I understand correctly, the 1 and 2 refer to the sample size?

mattg3004 commented 4 days ago

I think you're right that the nSamples isn't actually impacting the results at all and it is just the choice of KK1 or KK2 which makes the difference. Yes, the 1 and 2 do specify the sample size and the KK stands for Kato-Katz which is the method used to test stool for eggs. The KK2 takes 2 samples from the stool which will be more accurate in determining prevalence and intensity of prevalence. Obviously KK3, KK4, ..., would be more accurate but would take too long to do in reality.

I think the specification of surveyType defining the results is better than relying on nSamples as it is clearer what is going on this way (at least to me). Clearly it is very confusing to have nSamples as an input if it does nothing. I think the process of getting the prevalences is horrendously complicated and confusing in general, so a re-write of this would be good. Since this PR doesn't actually change the results at all, should we just park it for now?