Support multiple sampling sites

seabbs commented 5 days ago

Do you have a clear view of how hard it would be to support multiple sites of data at once here?

Looking at the model code it looks like doing something fairly simple where you assume the difference is just some kind of observation error (potentially AR structured or similar) would be relatively easy?

adrian-lison commented 4 days ago

Especially when assuming a shared Rt and infection trajectory, this would be mostly straightforward.

A small challenge when not being tightly integrated with case count data is to find the right weights for the different sites. One can try to choose them such that the load per case in the total population (all catchments summed up) for each site is proportional to the relative population size of the catchment. This however makes the assumption that the recovery of genetic material is identical across sites. If catchment-specific case count data is provided one could calibrate to that.

Another slightly complication is that when using the dPCR-specific observation model, one either has to assume identical lab parameters (i.e. if all sites are processed by the same lab) or also stratify across these.

Now that the single-site model is rather mature, I am motivated to support this not too far in the future.

seabbs commented 2 days ago

Especially when assuming a shared Rt and infection trajectory, this would be mostly straightforward.

This is my preferred approach to thinking about this in the first instance. @kaitlynjohnson and co had a variant of their model that looked like this but it was dropped due to lack of resource.

. If catchment-specific case count data is provided one could calibrate to that.

Agree that this would be useful but the linkage this implies is beyond the simple feature I was suggesting here.

Another slightly complication is that when using the dPCR-specific observation model, one either has to assume identical lab parameters (i.e. if all sites are processed by the same lab) or also stratify across these.

Either of which would make sense as options right? Both of these approaches would be possible in the current framework without new features?

Now that the single-site model is rather mature, I am motivated to support this not too far in the future.

Nice. My personal preference is to fit the weights as a intercept (all the sites are smaller than the total pop) and a random effect (sites have varying catchments) and strongly suggest these are given as informative priors. Then as you say you need support for different labs and to additionally allow sites to randomly vary to unmodelled factors.

kaitejohnson commented 1 day ago

A small challenge when not being tightly integrated with case count data is to find the right weights for the different sites. One can try to choose them such that the load per case in the total population (all catchments summed up) for each site is proportional to the relative population size of the catchment. This however makes the assumption that the recovery of genetic material is identical across sites. If catchment-specific case count data is provided one could calibrate to that.

I think I might be missing something here, but regarding this point, what is the need for the relative weights of each catchment if the assumption is that all sites have the same Rt and infection trajectory? Couldn't you just estimate a site-specific recovery of genetic material which would account for differences in the magnitude of observed concentrations across the sites?

I really like the idea of stratifying across the lab parameters. We've thought about this, as we currently treat each lab-site pair as an independent group. Would be nice to learn lab parameter specific properties in labs processing samples from different sites.

adrian-lison / EpiSewer

Support multiple sampling sites #22