crowdfightcovid19 / req-550-Syria

Repository to host the code of request 550 coming from the Pax Syriana Foundation
Other
3 stars 2 forks source link

Estimation of beta and average contacts from R0 #13

Closed apascualgarcia closed 4 years ago

apascualgarcia commented 4 years ago

This issue is relevant for: @Jennifer-Villers and @jordan-klein: we will need to know if a some parameters can be found in the literature. @JudithBouman2412: Involves the estimation of R0. @ecam85: Involves the estimation of the matrix of contacts.

Problem The original idea was to estimate the beta parameters (beta_P, beta_A, beta_I and the new beta_C) as follows:

However, we still have another unknown, which is the contact matrix C_{i,j}. In the simplest case in which there is a well-mixed scenario, it can be reduced to a single parameter C. R0 is an adimensional number that can be split in three terms (thanks Jordan for the literature):

R0 = (infection/contact)x(contact/time)x(time/infection). 
     = (transmissivity) x (average rate of contacts) x (duration of infection)

In the absence of an explicit contact matrix, I guess that what we are calling beta would be: beta=(transmissivity) x (average rate of contacts) , with units of [infection/time]. However, since we are modelling explicitly the average rate of contacts via the matrix C_{i,j}, beta could be: beta=(infection/contact) which I think will not depend strongly on the human setting since it rather looks like an intrinsic property of the disease.

Questions (Please answer labelling with 1, 2, ...)

  1. Is it possible to find transmissivity values in the literature?
  2. If these values exist, for given transmissivity and R0 we could solve in the mean field scenario for C. Then, departing from the mean field scenario can be achieved writing the contacts matrix as C{i,j}=C*m{i,j} where C is known and m_{i,j} would be a "management" matrix representing the fraction of the average rate of contacts between classes achieved with any management strategy in the camp.
  3. Alternatively, we could try to estimate C_{i,j} with other arguments or with an individual based model, which will take time.
  4. Another complication is that the analytic expression of R0 depends on the management scenario (the size of the next generation matrix changes, so we would need a general expression which I don't think is possible for all management scenarios)
jordan-klein commented 4 years ago

@apascualgarcia

  1. I have not seen estimates for "transmissivity" it the literature that are not derived based on the values of R0/beta the authors use in a model.

For 2-4: Why don't we solve this problem by using a unitless "mixing matrix" in lieu of a "contact matrix"? (I thought this was what you have been referring to when you discuss the contact matrix). This way (transmissivity) x (average rate of contacts) are both encapsulated by the beta parameter and we do not need to derive separate parameters for transmission per contact and contact rate. The construction of a unitless mixing matrix and its application in the derivation of R0 is discussed in the literature I provided.

apascualgarcia commented 4 years ago

You are right, in the current version of my notes C is adimensional, as in that paper. The problem comes from the fact that if we use a fixed R0 and solve for beta, beta will be different for each scenario in which we manipulate C because the next generation matrix will change (and we cannot calculate R0 diagonalizing the matrix because there is no beta). This makes to me challenging to parametrize C. For instance, consider that beta is the (transmissivity) x (average rate of contacts) and that in the well-mixed scenario I solve it in the equation with R0 and I obtain beta=beta1. Then let's say we want to reduce in 50% de rate of contacts between two classes, which should be the value of C? If I choose C=0.5 to reduce these contacts, beta=beta1 will no longer fulfil the equation with R0. We may want to solve again for beta in the new model and say we get beta2, but this means that changing one parameter between two compartments also changes parameters for the other compartments, which we don't want because the original rate was beta1, and we want to model 100 and 50% of that value! I think there is some circularity in the method here so this is why I was looking for a workaround because I feel we need a precise meaning of C to manage the scenarios that will be proposed.

Anyway, let's see the Judith formulas, with the maths should be easier to see it. Admittedly, it may just be that I have to sleep.

apascualgarcia commented 4 years ago

I see, the idea is to take the original R0 and see how we can reduce it. So computing a first value of beta = beta1, then changing C and finding a new R0. So there is no issue here and indeed I need to sleep

apascualgarcia commented 4 years ago

@ecam85 @Jennifer-Villers @jordan-klein

I would like to reopen this issue because some of the questions appear again in Eduard's notes related to the estimation of the contact matrix.

ecam85 commented 4 years ago
  • I think the average number of contacts for adults is underestimated. If each tent has 7-9 people, every individual already has 6-8 daily contacts, so an adult will probably have more than 10.

I see your point, we should confirm this with Chamsy (10 was his estimate in Slack). Maybe 15 or so?

  • An important question is the interpretation of beta and C_ij.

I agree it is a matter of fixing the notation. I am happy with any choice. If need be we could keep C_ij for the probabilistic version as per Jordan's notation, and use \kappa_ij for the average contacts/person/time.

  • Regarding how to proceed, I do not agree that we should solve for log(1-p) factorizing it out and substituting values of C_ij from Table 4 in Eduard's document, but rather substituting values of C_ij from the (still missing) null model (well mixed scenario).

Yes, this is a much better approach.

apascualgarcia commented 4 years ago

Would you be able to address the first point?

The first thing is that it is needed to estimate the average number of contacts between classes in the null model (well mixed scenario). Under this scenario and starting from the average number of contacts reported by Chamsy, the average number of contacts between classes depends on the proportion of individuals within each class. Therefore, the first expression should reflect these proportions.

To me it is the most urgent one.

ecam85 commented 4 years ago

I just uploaded a new version of the contacts document with the table corresponding to the null model, using the same assumptions about the overall number of contacts. If we finally increase the contacts of the adult group, we will need to update the table accordingly.

apascualgarcia commented 4 years ago

Thanks Eduard,

Please note that @jordan-klein processed Syrian data and there are precise estimates of the proportions for each population class, please use those numbers.

I think that what it is needed is a formal description of the methodology to do these estimations that is as independent as possible of the assumptions (i.e. empty-of-content equations and a code) in a way in which if we have to change any of the assumptions we can rapidly update the data. As it is presented now, the results depend on closed assumptions such as "half of the carers in the green zone have daily contact with 3 orange adults" which is not flexible. We aim for an algorithm such as "X% of the population A in the zone U have daily contact with N individuals of class B, the algorithm provides a contact matrix value C_AB=number"

Cheers

jordan-klein commented 4 years ago

@ecam85 @apascualgarcia The proportion in each population class is here.

jordan-klein commented 4 years ago

@apascualgarcia @ecam85 I do not agree with the current interpretation of C_ij; I thought we had previously agreed that it would be expressed as a set of dimensionless values [0,1] that express the proportion of each population class' contacts that are with members of each class? This is how I have seen contact matrices (or "mixing matrices") used in the literature (page 7) (page 9). Interpreting C_ij as such would also make it easier to generate a description of the methodology that is independent of the assumptions.

Caveat that this interpretation of C_ij means that we assume each population class has the same beta, which is not necessarily true in practice if we want to reflect children having a higher rate of transmission than adults due to the greater number of contacts they make in the model. Having a young population structure living in densely populated conditions is captured in the high R0 we use, but if having children have a higher transmission rate is instrumental to the model, my preferred approach would be to express that each population class' contribution to R0 is proportional to their number of contacts out of the total number of contacts made by all population classes.

Let me know if this is not clear or it would be best to speak about this over zoom.

jordan-klein commented 4 years ago

To simplify- I think it would be most straightforward to:

A) Express C_ij as a set of dimensionless probabilities which add up to 1 for each population class.

B) Use 3 betas with a known relationship, so using @ecam85's estimates, beta_children = 2beta_adults = 2beta_elderly.

Let me know what you think @apascualgarcia

apascualgarcia commented 4 years ago

Hi @jordan-klein,

Please note that Eduard did not participate in our previous discussions so he independently came out with an interpretation similar to the one I had. As I said above I don't think there is a problem in using one interpretation or another as soon as we are clear with the dimensions and the assumptions, it is always easy to add one more adimensional factor.

Personally, I think we should favour the interpretation that will allow us to provide a more transparent parametrization of the model. This will also help us to reparametrize the model more easily for any new intervention proposed in the camps.

In this respect, I think we should solve how the contacts are estimated, which it is still unclear to me in Eduard's document. It would be helpful if you give your view on how you would address that problem, I will work on that as soon as I have a more advanced version of the script computing R0.

jordan-klein commented 4 years ago

@apascualgarcia @ecam85 I worked on this today. Please see my updated derivation of R0 that explains how the contact matrix fits into this derivation.

ecam85 commented 4 years ago

I uploaded a spreadsheet that computes the contact matrix. It also includes the normalised version with rows normalised to sum 1. You can find it here

I added a brief section about how to use it, but essentially, the first three tabs contain parameters/estimates (age estructure, overall contacts per day, usage of neutral zone).

The rest of the tabs are computed automatically. I updated it to include Jordan's age structure data as well as the new estimates on contacts per day provided by Chamsey.

How the contact matrix is computed (under what asuumptions) is described in contacts.pdf, but it is simply assuming that contacts are distributed across the population, so for instance if young people in orange zone have 20 contacts per day, and 1/4 of the orange population is young, 3/4 is adult, young orange people will have 201/4 contact with young-orange, 203/4 contacts with adult-orange, per day. (not real numbers, just an example!)

For the neutral zone, I assume we will get an estimate for what fraction of each population uses it. We can reparameterise this if we cannot get this information in this way.

Regarding what version to use (normalised or non-normalised), I think they are equivalent in terms of the model (because we take as given the overall number of contacts of each age class), and since Jordan prepared everything with the normalised version, I guess it is better to stick to that one. In any case, the non-normalised version is easier to interpret (at least for me!) and we can go from one to the other.

jordan-klein commented 4 years ago

Thanks @ecam85. I'm not sure I understand your "well-mixed" population distribution or normalised null contact matrix. Where are you getting the numbers for the well-mixed population distribution? Furthermore, under the null model with a well-mixed population, each population class should have the same contact probability, and the contact matrix should be independent of the population distribution. Please also note that we have 5 population classes in the model. As far as different total numbers of daily contacts for each population class go, it may be more straightforward to represent this through different values for beta for each population class. For example, based on your new numbers of daily contacts, we can have beta_children = 2.5beta_elderly, and beta_adults = 1.5beta_elderly. Please see how I describe the contact matrix and the beta parameters in the context of the R0 derivation for what my thoughts are on how to construct these to be able to do the computations we need for the model.

I know we've worked on different things and there are a few places where we aren't on the same page, and I'd be happy to meet to discuss if @apascualgarcia needs further clarification to be able to finish building the models.

apascualgarcia commented 4 years ago

@jordan-klein and @ecam85 thanks for the efforts!

Jordan, some comments/questions about your proposal:

I think that the line of thought of Eduard is more easy to follow, so I propose an alternative parametrization and subsequent derivation here. I explain how the contact matrix is parametrized and how the different types of interventions affect the contact matrix. Also, how the transmissivity \tau (which is the only unknown) can be parametrized. This is exactly the same strategy presented in this paper. I should still review that the computations are ok (today is very late and I'm tired so there may be errors) and properly write the age structured coefficients that are not in matrix notation yet.

Cheers

jordan-klein commented 4 years ago

@apascualgarcia I like your approach, it's a good synthesis. Your derivation of Theta = CN^(-1) should yield the same result that my Theta = NCN^(-1) does since your contact matrix is a function of N to begin with.

Instead of Theta = Cdiag(N)^(-1) I would do Theta = CN^(-1), where N is a diagonal matrix whose nonzero values are the sizes of each population class so that Theta is a square matrix. Also, I would spot check the Theta matrix that is computed to make sure it is reasonable.

ecam85 commented 4 years ago

Thanks @ecam85. I'm not sure I understand your "well-mixed" population distribution or normalised null contact matrix

The well-mixed population was not updated, it still had the values from Chamsy's slide. I fixed it now.

jordan-klein commented 4 years ago

@apascualgarcia would it make sense to have \tau in your derivation of R0 to be class-dependent? New meta analysis showing odds of infection per contact are lower for children relative to adults. A way to implement this could be \tau_adults=\tau_elderly, and \tau_children=k*\tau_adults, where k = .44, or more precisely ln(k) ~ normal(ln(.44), .23).

apascualgarcia commented 4 years ago

@apascualgarcia I like your approach, it's a good synthesis.

Cool that we agree!

Instead of Theta = Cdiag(N)^(-1) I would do Theta = CN^(-1), where N is a diagonal matrix whose nonzero values are the sizes of each population class so that Theta is a square matrix.

The meaning of diag(N)^(-1) is indeed a diagonal matrix whose nonzero values are the sizes of each population class

would it make sense to have \tau in your derivation of R0 to be class-dependent? New meta analysis showing odds of infection per contact are lower for children relative to adults

In principle, I am resistant to add more parameters that may lead to more optimistic scenarios unless we are really convinced, also because the model is already quite complex. Did you check if the living conditions are comparable? since living conditions in the camp may make kids more vulnerable than those analysed in the paper, this parameter may lead to more optimistic scenarios, I think we should try to be conservative.

jordan-klein commented 4 years ago

@apascualgarcia the meta analysis looks at Europe, East Asia, & the Us, so the living situations are not comparable. I think it makes sense not to complicate the model further.

apascualgarcia commented 4 years ago

@jordan-klein, @ecam85

Could you please prepare the files for the null model in the format needed for the script while I finish the other script for the tau estimation? It would be great if you start creating the input directories for the script for all models, see the Trello card I created announced in slack.

Thanks

jordan-klein commented 4 years ago

@apascualgarcia Done!

apascualgarcia commented 4 years ago

@jordan-klein, @JudithBouman2412 , @ecam85

Guys, the estimation of tau is wrong, it is not true that tauA=taulambda with lambda an eigenvalue of A, the eigenvalues don't change when multiplying a scalar. I cannot see immediately how to estimate tau

jordan-klein commented 4 years ago

@apascualgarcia, @JudithBouman2412, @ecam85

This is correct, an oversight on our part. It looks like we will need to find the spectral radius of Ks, described here algebraically, set this equal to R0, and then solve for tau. We would need to use a computer algebra system to do this. I haven't used them much except for a bit of the R packages that interface with yacas and matlab, so I would really appreciate it if someone else could pitch in to help on out this one!

apascualgarcia commented 4 years ago

@jordan-klein, @JudithBouman2412, @ecam85

Sorry for the mess, it was my bad, I think the computation is correct. The error came from taking the absolute value of the eigenvalues before taking the real part, now I get the correct values.