bodkan / slendr

Population genetic simulations in R 🌍
https://bodkan.net/slendr
Other
54 stars 5 forks source link

Fix saving / loading of tree sequences after simplification to a subset of sampled individuals #137

Closed bodkan closed 1 year ago

bodkan commented 1 year ago

An attempt at fixing #136.

codecov[bot] commented 1 year ago

Codecov Report

Merging #137 (6c3a7e8) into main (01af510) will increase coverage by 0.30%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #137      +/-   ##
==========================================
+ Coverage   84.04%   84.35%   +0.30%     
==========================================
  Files           7        7              
  Lines        3109     3144      +35     
==========================================
+ Hits         2613     2652      +39     
+ Misses        496      492       -4     
Impacted Files Coverage Δ
R/interface.R 80.36% <100.00%> (+0.06%) :arrow_up:
R/tree-sequences.R 88.39% <100.00%> (+0.61%) :arrow_up:
R/utils.R 88.37% <100.00%> (+0.28%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

bodkan commented 1 year ago

Merging now. From the changelog:

Fix for ts_load() failing to load slendr-produced tree sequences after they were simplified down to a smaller set of sampled individuals (reported here). The issue was caused by incompatible sizes of the sampling table (always in the same form as used during simulation) and the table of individuals stored in the tree sequence after simplification (potentially containing a smaller set of individuals than in the original sampling table). To fix this, slendr tree sequence objects now track information about which individuals are regarded as "samples" (i.e. those with symbolic names) which is maintained through simplification, serialization and loading, and used by slendr's internal machinery during join operations. (PR #137)

bodkan commented 1 year ago

I have a feeling there's a more elegant solution to this. It's more and more clear that a Great Internal Refactoring will have to happen, eventually. The fact that slendr started as a purely SLiM-focused thing, had tskit support bolted on after, and yet after that added support for msprime coalescent models is... starting to show.

In the meantime, there's an acute need for Real Research(tm) to be done, so an emergency fix works just fine.