Open mattg3004 opened 5 days ago
I tried to write a test:
import numpy as np
import numpy.testing as npt
import sch_simulation
from sch_simulation.helsim_FUNC_KK.helsim_structures import Parameters, ProcResult
import sch_simulation.helsim_FUNC_KK.results_processing as results_processing
import sch_simulation.helsim_RUN_KK
def test_get_burdens_with_non_default_sample_size():
np.random.seed(10)
hostData = [
ProcResult(
vaccState=np.full((1000,4), 1),
wormsOverTime=np.full((1000,4), 20),
ages=np.full((1000,4), 10),
timePoints=range(4),
prevalence=np.zeros(1000),
femaleWormsOverTime=np.full((1000,4), 10)
)
]
parameter_file_name = "mansoni_params.txt"
params = sch_simulation.helsim_RUN_KK.loadParameters(
parameter_file_name, "UgandaRural"
)
# Low test sensitivity so that repeated samples increases prevalence
params.testSensitivity = 0.001
prevalence, _, _, _, _ =results_processing.getBurdens(hostData, params, 1, np.array([5, 95]), False, "KK1", nSamples=1)
npt.assert_array_equal(prevalence, np.array([0.027, 0.03 , 0.031, 0.034]))
But these prevalence values are not adjusted regardless what nSamples I provided - looking through I found getSetOfEggCounts
does not use nSamples, though the result is divided by them - I don't know if that is important. I suspect the real issue is my setting up of the parameters resulting in no eggs being produced - perhaps you can see how to tweak the values to ensure that the nSamples has an impact.
The values do change if switching from "kk1"
to "kk2"
and if I understand correctly, the 1 and 2 refer to the sample size?
I think you're right that the nSamples
isn't actually impacting the results at all and it is just the choice of KK1
or KK2
which makes the difference. Yes, the 1 and 2 do specify the sample size and the KK
stands for Kato-Katz which is the method used to test stool for eggs. The KK2
takes 2 samples from the stool which will be more accurate in determining prevalence and intensity of prevalence. Obviously KK3
, KK4
, ..., would be more accurate but would take too long to do in reality.
I think the specification of surveyType
defining the results is better than relying on nSamples
as it is clearer what is going on this way (at least to me). Clearly it is very confusing to have nSamples
as an input if it does nothing. I think the process of getting the prevalences is horrendously complicated and confusing in general, so a re-write of this would be good. Since this PR doesn't actually change the results at all, should we just park it for now?
the
nSamples
used within thegetBurdens
was hard coded as 1, rather then taking the entry from the inputs. This is edited to take the value from the inputs.This function will be used to generate the IHME and prevalence outputs, so we need to have this included in the code to output these data. The fits which have been done didn't rely on this function, so there wasn't an issue with the outputs from the model.