bodkan / demografr

Fast and simple simulation-based population genetic inference in R
https://bodkan.net/demografr
Other
27 stars 1 forks source link

Add an "intelligent" model plotting function? #2

Open bodkan opened 1 year ago

bodkan commented 1 year ago

demografr uses slendr as a means to specify demographic models whose parameters are to be inferred. This takes a form of a simple slendr function, which simply compiles a model to a single slendr object. In its simplest form, it can look something like this:

model <- function(Ne_A, Ne_B, Ne_C, Ne_D, T_AB, T_BC, T_CD) {
  popA <- population("popA", time = 1,    N = Ne_A)
  popB <- population("popB", time = T_AB, N = Ne_B, parent = popA)
  popC <- population("popC", time = T_BC, N = Ne_C, parent = popB)
  popD <- population("popD", time = T_CD, N = Ne_D, parent = popC)

  model <- compile_model(
    populations = list(popA, popB, popC, popD),
    generation_time = 1, simulation_length = 10000
  )

  return (model)
}

The parameters of this function (Ne_A, Ne_B, ...) are then parameters which will be inferred by demografr automatically by fitting against observed data.

To avoid easy-to-miss bugs in misspecification of models, it would be nice to have a way to visualize demografr model functions without the need to enter dummy values (annoying).

However, one can't just plug in random values because (most?) parameter combinations would lead to an invalid, uncompilable model.

Thoughts/options:

The final dummy model would be plotted as a simple tree, without showing the dummy Ne and times in any scale at all (similarly, gene flow events would be plotted just with arrows).

I love messing with AST but I have no time for this. :( Maybe a project for a computer sciency student?

currocam commented 4 months ago

Hi!

I have given it a try :) I don't have any experience with slendr (beyond your talk at the SLiM workshop in CPH) and very limited experience with metaprogramming stuff, so don't expect much. I'm very interested in the demografr package, and that's how I ended up looking at the GitHub issues.

I have implemented the function and some very basic tests. If you think this is a good idea and you have the time & you think it's relevant, I could try to find some time to work on that if you provide me with test-cases (as I'm not familiar with slendr).

Best, Curro

bodkan commented 4 months ago

Hey there Curro!

First of all -- whoah I think this is the very first time someone implemented a real PR to any of my software!

Thanks so much for taking a look at this and apologies for the delay -- I've spent the past few weeks in the middle of nowhere, deliberately cut off from the digital world. :)

I have been slowly picking up pieces of my work where I left it of, but I hope I can take a look at this issue (and your implementation) as soon as I can. The work on demografr has been halted because of a large refactoring of slendr I've been working on for a few months to make some important demografr features doable. Before that's done, it didn't really make much sense to implement more demografr features, so that's why everything has been a bit quiet here lately. One of the most important features is the possibility to do non-slendr simulations with demografr, which is maybe something you might be interested in.

I will definitely ping you once I get to it.

Huge thanks again!

currocam commented 4 months ago

Hi Martin!

No worries about the delay! I imagined you were on vacation. Having non-slendr simulation sounds great, definitely ping me!

Btw, I just realized I did not post the final plot for the example case you gave. This is current state of my PR (with a mix of static analysis and iteratively trying values)

library(slendr)
model <- function(Ne_A, Ne_B, Ne_C, Ne_D, T_AB, T_BC, T_CD) {
  popA <- population("popA", 1, Ne_A)
  popB <- population("popB", T_AB, Ne_B, popA)
  popC <- population("popC", T_BC,Ne_C, popB)
  popD <- population("popD", T_CD, Ne_D, popC)
  model <- compile_model(
    populations = list(popA, popB, popC, popD),
    generation_time = 1, simulation_length = 10000
    )
    return (model)
  }
plot_specification(model, ntries = 5)

image

Best, Curro