Closed bodkan closed 1 year ago
The issue comes down to the ordering of populations along the x-axis. The current slendr plot_model
algorithm simply sorts populations according to the split time, and arranges the population polygons in sort of a cascading layout going from left to right (picture above).
My first idea was to switch this ordering algorithm to one which orders population not according to the split time, but according to the in-order traversal principle. Implementation is here.
When ran on the example model above, it seems to be working OK?
plot_model(model)
plot_model(model, sizes = FALSE)
A couple of more tests using models that @IsabelMarleen implemented for her thesis. All of those were completely unreadable with the old plot_model()
but look quite nice now. (Except for the gene flow arrows which... I'm not entirely sure how to fix for extremely complex migration models and probably won't be tackling here.)
Previous version of this was next to useless:
Hmm, another test -- what if there are multiple ancestral populations (i.e. populations without a parental population?).
o1 <- population("outgroup1", time = 1, N = 5000)
pop_ancA <- population("pop_ancA", time = 100, parent = o1, N = 3000, remove = 200)
popA1 <- population("popA1", time = 200, parent = pop_ancA, N = 400)
popA2 <- population("popA2", time = 200, parent = pop_ancA, N = 10000)
o2 <- population("outgroup2", time = 1, N = 7000, remove = 800)
popB1 <- population("popB1", time = 800, parent = o2, N = 200)
popB2 <- population("popB2", time = 800, parent = o2, N = 2000)
popX <- population("popX", time = 801, parent = popB1, N = 1500)
popY <- population("popY", time = 802, parent = popX, N = 8500)
pop_ancC <- population("pop_ancC", time = 700, parent = o1, N = 2000, remove = 1000)
popC1 <- population("popC1", time = 901, parent = pop_ancC, N = 550)
popC2 <- population("popC2", time = 901, parent = pop_ancC, N = 7500)
model <- compile_model(
list(o1, o2, pop_ancA, popA1, popA2, popB1, popB2, popX, popY, pop_ancC, popC1, popC2),
generation_time = 1, simulation_length = 1000
)
plot_model(model)
Old plot_model()
:
New plot_model()
from this PR:
Definitely more useful.
I am kind of bothered by the overlapping population labels which can also obscure the splitting lineages (observe the purple "popB1" overlapping with darkish blue "popX", and also obscuring where is the split happening from-to and when).
I wonder if labels of populations which are not themselves parents of something could be plotted on the bottom? The overlap will always be there, unfortunately, but at least the split of lineages will be visible?
It was also appropriate to fix #131 here.
I am kind of bothered by the overlapping population labels which can also obscure the splitting lineages (observe the purple "popB1" overlapping with darkish blue "popX", and also obscuring where is the split happening from-to and when).
I wonder if labels of populations which are not themselves parents of something could be plotted on the bottom? The overlap will always be there, unfortunately, but at least the split of lineages will be visible?
OK, I ended up placing labels of populations which:
along the bottom of the output.
I think this looks much nicer than the first draft version:
In particular in the model which has many split events happening quickly in sequence, the splits themselves are no longer obscured by the labels (image below). Not super clear but much clearer. Maybe distributing colors of lineages not in a gradient with colors closer the closer the populations are along the x-axis would help? Although significant jumps in color wouldn't look that nice. I think I'll keep things this way, for now.
I have a feeling something might be still off with plotting on a log10-scale y-axis. I will think about that tomorrow.
That said, I think we're there, @IsabelMarleen! I will ping you again once I merge the PR (tomorrow, I think). You'll then be able to update figures in your thesis.
Merging #135 (03444e5) into main (109732f) will increase coverage by
0.10%
. The diff coverage is98.18%
.
@@ Coverage Diff @@
## main #135 +/- ##
==========================================
+ Coverage 83.77% 83.88% +0.10%
==========================================
Files 6 6
Lines 3076 3103 +27
==========================================
+ Hits 2577 2603 +26
- Misses 499 500 +1
Impacted Files | Coverage Δ | |
---|---|---|
R/visualization.R | 64.12% <98.18%> (+2.37%) |
:arrow_up: |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
A few minor updates and fixes (in particular, y-axis log10 time scale turned out to be broken for forward-time models). I also changed the color of the "population split lines" to match the population that is diverging from its parent population, rather than the other way around.
I could imagine tweaking and aligning pixels here and there but... I think two days was more than enough time to spend on this, so I will merge the PR as soon as all CI tests pass.
I will likely decorate this with a v0.6.0 version too as there have been reasonably useful fixes and changes implemented since the last version that would be nice to have on CRAN for everyone out there.
As soon as more complex demographic models were implemented using slendr, a clear and very annoying issue popped up.
Consider the following random model:
When this is visualized, the following silliness pops up:
The issue is especially prominent if many split and gene flow events happen close in time.
In my defense, when
plot_model()
was written, I only used fairly compact models for testing slendr. With more complex models being programmed these days -- either imported via demes->slendr importer written by @IsabelMarleen, models developed and tested as part of the work on our new ABC package, or randomized models -- this is getting more and more annoying.This PR is an attempt to fix this using a relatively straightforward solution while keeping most of the
plot_model
code intact (that took a lot of ggplot2 black magic and I'd rather not touch that unless I absolutely have to).The main idea is to solve most of the most apparent issues which are hindering the role of
plot_model()
for debugging and inspection of models, not more (a graph can always be made prettier but that's not the goal here).