bodkan / slendr

Population genetic simulations in R 🌍
https://bodkan.net/slendr
Other
54 stars 5 forks source link

There's an issue with saving and/or loading a simplified tree sequence (carrying slendr metadata) #136

Closed bodkan closed 1 year ago

bodkan commented 1 year ago

Let's encode the following trivial model:

model <-
  population("pop", time = 1, N = 100) %>%
  compile_model(generation_time = 1, simulation_length = 1000, direction = "forward")

Simulate a tree sequence from it, save that tree sequence to disk:

ts <- slim(model, sequence_length = 1000, recombination_rate = 0)
ts_save(ts, "/tmp/big.trees")

So far so good.

> ts_load("/tmp/big.trees", model)

╔════════════════════════╗
║TreeSequence            ║
╠═══════════════╤════════╣
║Trees          │       1║
╟───────────────┼────────╢
║Sequence Length│    1000║
╟───────────────┼────────╢
║Time Units     │   ticks║
╟───────────────┼────────╢
║Sample Nodes   │     200║
╟───────────────┼────────╢
║Total Size     │95.3 KiB║
╚═══════════════╧════════╝
╔═══════════╤════╤════════╤════════════╗
║Table      │Rows│Size    │Has Metadata║
╠═══════════╪════╪════════╪════════════╣
║Edges      │ 366│11.4 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Individuals│ 256│26.8 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Migrations │   0│ 8 Bytes│          No║
╟───────────┼────┼────────┼────────────╢
║Mutations  │   0│ 1.2 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Nodes      │ 367│14.3 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Populations│   1│ 2.3 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Provenances│   1│33.9 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Sites      │   0│16 Bytes│          No║
╚═══════════╧════╧════════╧════════════╝
> ts %>% ts_simplify() %>% ts_save("/tmp/small1.trees")
> ts_load("/tmp/small1.trees", model)
╔════════════════════════╗
║TreeSequence            ║
╠═══════════════╤════════╣
║Trees          │       1║
╟───────────────┼────────╢
║Sequence Length│    1000║
╟───────────────┼────────╢
║Time Units     │   ticks║
╟───────────────┼────────╢
║Sample Nodes   │     200║
╟───────────────┼────────╢
║Total Size     │95.6 KiB║
╚═══════════════╧════════╝
╔═══════════╤════╤════════╤════════════╗
║Table      │Rows│Size    │Has Metadata║
╠═══════════╪════╪════════╪════════════╣
║Edges      │ 365│11.4 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Individuals│ 255│26.7 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Migrations │   0│ 8 Bytes│          No║
╟───────────┼────┼────────┼────────────╢
║Mutations  │   0│ 1.2 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Nodes      │ 366│14.3 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Populations│   1│ 2.3 KiB│         Yes║
╟───────────┼────┼────────┼────────────╢
║Provenances│   2│34.5 KiB│          No║
╟───────────┼────┼────────┼────────────╢
║Sites      │   0│16 Bytes│          No║
╚═══════════╧════╧════════╧════════════╝
> ts %>% ts_simplify(simplify_to = c("pop_1", "pop_2", "pop_3")) %>% ts_save("/tmp/small2.trees")
> ts_load("/tmp/small2.trees", model)
Error in `dplyr::bind_cols()`:
! Can't recycle `..1` (size 3) to match `..2` (size 100).
Run `rlang::last_trace()` to see where the error occurred.

The error suggests an issue with some join-like operation on some tree sequence table, although I'd think the tree sequence itself should carry every piece of slendr-relevant metadata along side the standard tskit data? Well, clearly not.

> ts_load("/tmp/small2.trees")
╔════════════════════════╗
║TreeSequence            ║
╠═══════════════╤════════╣
║Trees          │       1║
╟───────────────┼────────╢
║Sequence Length│    1000║
╟───────────────┼────────╢
║Time Units     │   ticks║
╟───────────────┼────────╢
║Sample Nodes   │       6║
╟───────────────┼────────╢
║Total Size     │44.5 KiB║
╚═══════════════╧════════╝
╔═══════════╤════╤═════════╤════════════╗
║Table      │Rows│Size     │Has Metadata║
╠═══════════╪════╪═════════╪════════════╣
║Edges      │  10│328 Bytes│          No║
╟───────────┼────┼─────────┼────────────╢
║Individuals│   8│  2.6 KiB│         Yes║
╟───────────┼────┼─────────┼────────────╢
║Migrations │   0│  8 Bytes│          No║
╟───────────┼────┼─────────┼────────────╢
║Mutations  │   0│  1.2 KiB│          No║
╟───────────┼────┼─────────┼────────────╢
║Nodes      │  11│  1.1 KiB│         Yes║
╟───────────┼────┼─────────┼────────────╢
║Populations│   1│  2.3 KiB│         Yes║
╟───────────┼────┼─────────┼────────────╢
║Provenances│   2│ 34.5 KiB│          No║
╟───────────┼────┼─────────┼────────────╢
║Sites      │   0│ 16 Bytes│          No║
╚═══════════╧════╧═════════╧════════════╝
bodkan commented 1 year ago

Yeah, the population had 100 individuals to start with. Simple simplification does not change the number of "sampled individuals" (still 100). But the subsetting of individuals changes the number of sampled individuals (in the code above down to 3), which matches the error from the join operation:

! Can't recycle ..1 (size 3) to match ..2 (size 100).