bodkan / slendr

Population genetic simulations in R 🌍
https://bodkan.net/slendr
Other
54 stars 5 forks source link

New ts_ibd function #123

Closed bodkan closed 1 year ago

bodkan commented 1 year ago

This adds a new function -- slendr's interface to TreeSequence.ibd_segments().

As explained in the ?ts_ibd manpage, this is not a real wrapper. R handles heavy iteration extremely poorly so the documented use cases wouldn't really work here. Certainly not for large tree sequences.

Instead, ts_ibd() collects all requested IBD data (either all individual IBD segments when coordinates = TRUE or counts and total pairwise IBD amount when coordinates = FALSE, which is the default) and returns the results as a plain data frame (EDIT: for spatial tree sequences the returned IBD table is now fully spatially annotated and is of the sf data type).

To help to make things manageable, pruning the IBDs to be returned either by setting the minimum length of an IBD segment to be considered, or via setting the maximum age of an ancestor of an IBD pair, is still supported. In fact, given how easy it is to choke on too much IBD, ts_ibd() writes a warning message if all possible IBDs are being requested by the user (something that is most likely an oversight during normal data analysis).

Similarly, the within = and between = arguments are also supported. In line with the rest of the slendr ts_*() library, these arguments accept symbolic names of individuals, not just integer IDs of nodes.

codecov-commenter commented 1 year ago

Codecov Report

Merging #123 (aa097e7) into main (e73fd18) will increase coverage by 0.31%. The diff coverage is 98.43%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main     #123      +/-   ##
==========================================
+ Coverage   83.37%   83.69%   +0.31%     
==========================================
  Files           6        6              
  Lines        2996     3060      +64     
==========================================
+ Hits         2498     2561      +63     
- Misses        498      499       +1     
Impacted Files Coverage Δ
R/tree-sequences.R 87.83% <98.43%> (+0.64%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

bodkan commented 1 year ago

IBD tracts collected from spatial tree sequences are now annotated with spatial coordinates of nodes and returned as spatial sf objects by default.


As an aside, note that although ts_ibd() returns IBD data in a tabular format as mentioned in the first post, and doesn't work with iteration (and never will), if users need to do iterate over massive amounts of IBD, they can always use the reticulate-d iteration in R just like is shown in tskit docs for Python. (Honestly though, at that point it's probably better to use Python.)