PoonLab / twt

Generic framework for the coalescent simulation of pathogen trees within host trees in R
GNU Affero General Public License v3.0
3 stars 2 forks source link

Sampling a lineage multiple times #131

Open RouxCil opened 2 years ago

RouxCil commented 2 years ago

This issue relates to sampling a lineage multiple times during the time frame. The error message depends on the number of replicates in the 'I' compartment and the seed:

set.seed(1234) / set.seed(123) if running with 5 replicates

path <- '~/Documents/1_projects/bayroot/MultitimeLineageSampling.yaml'
model <- Model$new(yaml.load_file(path))

outer <- sim.outer.tree(model)
plot(outer, type='s')

inner <- sim.inner.tree(outer)
plot(inner)

With 1: outer$get.eventlog() shows no events to display With 2: @sim.outer.tree(model) Error in .assign.events(run, events) : Error in .assign.events(): No eligible compartments for transmission event 1.068426060386581hosthost595 With 5 and 123 seed: @sim.inner.tree(outer): Error in .rexp.coal(k, comp, current.time) : Error in rexp.coal: called on Compartment with <2 lineages.

InitialConditions:
  originTime: 10.0
  size:
    host: 100
  indexType: 'host'

CompartmentTypes:
  'host':
    branching.rates: (host=0.01)
    migration.rates: ()
    transition.rates: ()
    effective.size: 100.0
    generation.time: 1
    bottleneck.size: 1                 
    wait.time.distr:                   
      distr: 'exp'
      hyperparameters:
        rate: 20

Compartments:
  'I':
    type: host
    replicates: 1 or 2 or 5

Lineages:
  'virus1':
    sampling.time: (0, 2.0, 3.0)            
    location: I
    replicates: 3 
GopiGugan commented 2 years ago

With 1 replicate:

https://github.com/PoonLab/twt/blob/7a03ee236c3faa095b74d9ab3a186283906c3ad2/R/simOuterTree.R#L624-L629

In the first iteration, there is only 1 active compartment, so the loop exits and the eventlog isn't updated:

> length(active)
[1] 1
ArtPoon commented 2 years ago

Escalating this. I copied the YAML from this issue and tried to reproduce the problem:

require(twt)
set.seed(1234)
path <- "~/git/twt/working/issue131.yaml"
model <- Model$new(yaml.load_file(path))

and got the following output:

Error in 1:nIndiv : NA/NaN argument
In addition: Warning message:
In lapply(X = X, FUN = FUN, ...) : NAs introduced by coercion
RouxCil commented 2 years ago

@ArtPoon Choose 1, 2 or 5 in the Compartments section in the posted yaml:

From

Compartments:
  'I':
    type: host
    replicates: 1 or 2 or 5

To

Compartments:
  'I':
    type: host
    replicates: 5

If you are using 5 change the seed too otherwise you will get a different error

ArtPoon commented 2 years ago

I'm guessing the problem lives somewhere around here: https://github.com/PoonLab/twt/blob/7a03ee236c3faa095b74d9ab3a186283906c3ad2/R/Model.R#L343

ArtPoon commented 2 years ago
ArtPoon commented 2 years ago

Using the above YAML, this script:

require(twt)

settings <- yaml.load_file("~/git/twt/working/iss131.yaml")
model <- Model$new(settings)

set.seed(1)
run <- sim.outer.tree(model)
run$get.eventlog()

tree <- sim.inner.tree(run)

results in the following error:

Error in .rexp.coal(k, comp, current.time) : 
  Error in rexp.coal: called on Compartment with <2 lineages.
ArtPoon commented 2 years ago

This works:

InitialConditions:
  originTime: 10.0
  size:
    host: 100
  indexType: 'host'

CompartmentTypes:
  'host':
    branching.rates: (host=0.01)
    migration.rates: ()
    transition.rates: ()
    effective.size: 100.0
    generation.time: 1
    bottleneck.size: 1                 
    wait.time.distr:                   
      distr: 'exp'
      hyperparameters:
        rate: 20

Compartments:
  'I1':
    type: host
  'I2':
    type: host

Lineages:
  'virus1':
    sampling.time: 3
    location: I1
    replicates: 1
  'virus2':
    sampling.time: 0
    location: I2
    replicates: 1

but changing this:

Lineages:
  'virus1':
    sampling.time: 3
    location: I1
    replicates: 2

fails with the rexp.coal error.

ArtPoon commented 2 years ago

This doesn't work either:

Lineages:
  'virus11':
    sampling.time: 3
    location: I1
    replicates: 1
  'virus12':
    sampling.time: 3
    location: I1
    replicates: 1
ArtPoon commented 2 years ago

Setting sampling.time for both Lineages in Compartment I1 to 0 works, but having one at 3 does not.

ArtPoon commented 2 years ago

Our guess is that sim.inner.tree is attempting to resolve a coalescent event on a Compartment that is known to carry two lineages, except that one of those lineages is not yet extant (sampled further back in time).

ArtPoon commented 2 years ago

Stepping through the code:

>   eventlog <- run$get.eventlog()
>   events <- eventlog$get.all.events()
>   # retrieve all Compartments in outer events
>   comps <- c(run$get.compartments(), run$get.unsampled.hosts())
>   types <- run$get.types()
>   # revert Compartments to their derived ("recipient") Types
>   # Types determine coalescent rates during simulation
>   transitions <- events[events$event.type=='transition', ]
>   for (i in order(transitions$time, decreasing=TRUE)) {
+     e <- transitions[i,]
+     comp <- comps[[e$compartment1]]
+     dev.type <- types[[e$type1]]
+     comp$set.type(dev.type)
+   }
>   # initialize simulation at most recent sampling time
>   current.time <- 0.
>   #extant.lineages <- run$get.extant.lineages(current.time)
>   n.extant <- run$get.num.extant(current.time)
>   # iterate through events in reverse time (most recent first)
>   events <- events[order(events$time), ]
>   row <- 1
> nrow(events)
[1] 3
> events
    event.type     time lineage1 lineage2 compartment1 compartment2 type1 type2
1 transmission 6.417672       NA       NA         I2_1    US_host_1  host  host
2 transmission 7.784641       NA       NA         I1_1    US_host_2  host  host
3 transmission 7.933861       NA       NA    US_host_1    US_host_2  host  host
> n.extant
[1] 2
> current.time
[1] 0
>   # iterate through events in reverse time (most recent first)
>   events <- events[order(events$time), ]
>   row <- 1
>     e <- events[row, ]  # retrieve outer event
> e
    event.type     time lineage1 lineage2 compartment1 compartment2 type1 type2
1 transmission 6.417672       NA       NA         I2_1    US_host_1  host  host
>   # retrieve all compartments
>   comps <- c(run$get.compartments(), run$get.unsampled.hosts())
>   compnames <- names(comps)
>   # retrieves compartment names for extant lineages
>   ext.lineages.compnames <- run$get.locations()
> ext.lineages.compnames
$I1_1__virus11_1
[1] "I1_1"

$I1_1__virus12_1
[1] "I1_1"

$I2_1__virus2_1
[1] "I2_1"
>   # retrieves compartments with multiple extant lineages
>   counts <- table(unlist(ext.lineages.compnames))
>   ext.comps <- comps[names(counts)[counts >= 2]]
> counts

I1_1 I2_1 
   2    1 
> ext.comps
$I1_1
<Compartment>
...
> comp <- ext.comps[[1]]
>     k <- run$get.num.extant(current.time, comp$get.name())
> k
[1] 1

I think we need to fix this line:

  ext.lineages.compnames <- run$get.locations()

so that it is aware of when lineages get sampled, not just their locations.

GopiGugan commented 2 years ago

When there are multiple sampling.time for a lineage, the Compartment's sampling time is set to the max time:

https://github.com/PoonLab/twt/blob/7a03ee236c3faa095b74d9ab3a186283906c3ad2/R/classes.R#L260-L264

But this can cause the following to fail :

https://github.com/PoonLab/twt/blob/7a03ee236c3faa095b74d9ab3a186283906c3ad2/R/simOuterTree.R#L651

where samp.times <- sapply(active, function(comp) comp$get.sampling.time())

in 4c6624e, the Compartment's sampling.time is set to the minimum sampling time for a lineage.

With 2: @sim.outer.tree(model) Error in .assign.events(run, events) : Error in .assign.events(): No eligible compartments for transmission event 1.068426060386581hosthost595

This should be fixed, but still running into the other error:

Error in .rexp.coal(k, comp, current.time) : 
  Error in rexp.coal: called on Compartment with <2 lineages.
RouxCil commented 2 years ago

The error is gone but in particular runs the output is not as expected. For example the inner tree with the below code only has 4 sequences where I would expect 5. If I run the code a couple of times some of the output does have the 5 sequence I would expect.

Screenshot 2022-03-08 at 11 57 31

vs

Screenshot 2022-03-08 at 11 56 30
set.seed(1234)
path <- '~/1_projects/bayroot/test.yaml'
model <- Model$new(yaml.load_file(path))

# run an outer tree simulation
outer <- sim.outer.tree(model)
# display the population trajectories
plot(outer, type='s')

inner <- sim.inner.tree(outer)
# display the inner tree annotated with transmission events (points)
plot(inner)

InitialConditions:
  originTime: 10.0
  size:
    host: 100
  indexType: 'host'

CompartmentTypes:
  'host':
    branching.rates: (host=0.01)
    migration.rates: ()
    transition.rates: ()
    effective.size: 100.0
    generation.time: 1
    bottleneck.size: 1                 
    wait.time.distr:                   
      distr: 'exp'
      hyperparameters:
        rate: 20

Compartments:
  'A':
    type: host
    replicates: 1

  'B':
    type: host
    replicates: 1

  'C':
    type: host
    replicates: 1

Lineages:
  'virus1':
    sampling.time: (0, 2.0, 5.0)            
    location: A
    replicates: 3

  'virus2':
    sampling.time: 0            
    location: B
    replicates: 1

  'virus3':
    sampling.time: 0
    location: C
    replicates: 1
ekankaka commented 1 year ago

Similar error when running sim.inner.tree(outer)

"Error in rexp.coal: time 4.81206974543449 is further back than infection time of Compartment B"