DrJCarson / TransPhyloMulti

GNU General Public License v3.0
3 stars 0 forks source link

Changing generation and sampling time distributions over time #25

Open emmapujolhodge opened 3 months ago

emmapujolhodge commented 3 months ago

Hi there Jake!

Just a quick question - I saw in the TransPhyloMulti paper that it should be possible to change the sampling proportion over time, but I just wanted to check whether the same thing could be done for the generation and sampling time scale and shape parameters?

Thanks so much in advance, looking forward to hearing if its possible :)

Emma

DrJCarson commented 3 months ago

Hi Emma,

in principle I think it is possible. From the paper: The unlabeled equation for xi(t) would use a time dependent sigma_t(tau). That is, use the appropriate shape and scale parameters for the sampling time distribution given the time t. Likewise, use gamma_t(tau) in eqauations 5 to 13. In equations 21 and 22, where sigma(t) and gamma(t) appear directly in the likelihood, again use the time dependent versions with t being the hosts infection time.

Practically, the package constructs a discretised version of xi(t) and gamma(t) spanning from the root of the tree to the observation end date. We then calculate the exclusion probabilities by iterating backwards in time using these functions. With varying parameters, whenever the parameters change as you iterate backwards you would need discretised versions of the new distributions.

Hope that helps!

Best, Jake

emmapujolhodge commented 3 months ago

Amazing thanks for the quick reply! I'll give it a go and see how I get on.

Thanks again!

Emma

emmapujolhodge commented 3 months ago

Hi Jake,

I have a couple follow up questions - first, do you think it would be possible to do the same thing in TransPhylo? Second, how exactly would I go about setting the new distributions at specified timepoints? I'm not sure how I would supply a stepwise function for the parameters to inferTTreeM, or would it involve modifying the inferTTreeM code?

DrJCarson commented 3 months ago

Hi Emma,

I'm afraid the code would need modifying in both TransPhyloMulti and TransPhylo. In inferTTreeM a new argument would be needed to indicate when the scale and shape paramers change. This would also need to be added to num_approx_disc (in utils.R) and log_lik_ttree (in likelihoods.R). The values themselves can probably be given as a vector in w.shape etc.

The function num_approx_disc corresponds to equations 5 to 13, calculating the exlusion probabilities. However, it uses rectangular integration instead of the trapezium rule. The variables that require updating are 'pit' (discretising xi(t) in the paper), 'gamma_prob' (discretising gamma(t) in the paper), and 'ft' (F(t) in the paper). To make your changes calculate them using the most recent values of w.shape etc. In the following for loop, whenever you hit a step change recalculate pit, gamma_prob, and ft using the new values. Note that in the loop, the current time is grid[g]. Don't worry about the output including gamma_prob, as that is used for simulation only.

In log_lik_ttree, wherever w.scale etc. are used, it would be a matter of using the correct values based on the hosts infection time 'ttree[i, 1]'.

I appreciate that it's hard to work with other peoples code, so if you have any further questions please do ask. :)

Best, Jake