MesserLab / SLiM

SLiM is a genetically explicit forward simulation software package for population genetics and evolutionary biology. It is highly flexible, with a built-in scripting language, and has a cross-platform graphical modeling environment called SLiMgui.
https://messerlab.org/slim/
GNU General Public License v3.0
160 stars 30 forks source link

Added functions for extracting upper and lower triangles of matrices #410

Closed nobrien97 closed 9 months ago

nobrien97 commented 9 months ago

Hi @bhaller,

Recently I needed to extract the upper triangle from a symmetric matrix, but found that the Eidos code to do so was a bit clunky and slow with larger matrices. So I implemented some functions in C which are faster and more user-friendly. upperTri and lowerTri work in exactly the same way as in R: you specify an input matrix x, and an optional argument for whether you want to include diagonal elements in your triangle or not. The default is to exclude them. The output is a logical matrix with the same dimensionality as x. The true elements in the output are the upper/lower elements, depending on the function called. You can then subset your input matrix by the upper/lower triangle to extract those values:

// Eidos
// Generate 4x4 input matrix
input = matrix(1:16), 4);
input
/*
     [,0] [,1] [,2] [,3]
[0,]    1    5    9   13
[1,]    2    6   10   14
[2,]    3    7   11   15
[3,]    4    8   12   16
*/

// Get upper triangle without diagonal entries
ut = upperTri(input);

// Extract diagonals
input[c(ut)];    // c() needed because we can't subset by matrices yet
// Output: 5 9 10 13 14 15

vs the R functionality

## R
## Generate 4x4 input matrix
input <- matrix(1:16), 4)
input

#      [,1] [,2] [,3] [,4]
# [1,]    1    5    9   13
# [2,]    2    6   10   14
# [3,]    3    7   11   15
# [4,]    4    8   12   16

## Get upper triangle without diagonal entries
ut <- upper.tri(input)

## Extract diagonals
input[ut]    ## no c() necessary 
# Output: 5 9 10 13 14 15

Another feature I could add is the diag function from R: however it has different behaviour, extracting the diagonal values directly rather than providing a logical matrix. Would be keen to hear your thoughts on whether I should replicate the R behaviour or keep it consistent with upperTri and lowerTri.

Cheers, Nick

bhaller commented 9 months ago

Hi @nobrien97! Wow, this looks great, thanks. I haven't looked at the PR in detail yet (happy Thanksgiving! :->), but it looks like you've got things working. If you do want to add diag(), go for it; happy to have these additions. I plan to release SLiM 4.1 in about a week, so if you want these additions to be in SLiM 4.1 please finish the PR ASAP. I would say replicate the R behavior if possible, for portability; I strive for that unless their chosen behavior strikes me as really seriously broken (which is sometimes the case, but does not seem to be the case here). I'll be more available to look at and review this PR perhaps Sunday. Thanks!

nobrien97 commented 9 months ago

Hi again @bhaller, Hope you had a great Thanksgiving! No such luck for a holiday here in Australia 😢 I've finished the diag() function and added it to this PR. I also fixed my triangle functions using row-wise indexing instead of column-wise (oops!).

diag() behaves mostly the same as in R, with three optional arguments: x, nrow, and ncol, giving different output depending on which input is specified:

  1. When x is a matrix, diag() returns the diagonal elements of x (nrow and ncol cannot be specified in this mode)
  2. When x is missing and nrow and/or ncol is given, diag() returns an identity matrix with dimensions nrow * ncol, nrow * nrow, or 1 * ncol, depending on what arguments are given
  3. When x is a singleton and is the only input, diag() returns a square identity matrix with dimensions equal to the length of x
  4. When x is a numeric or logical vector, diag() returns a matrix with the diagonal values in x and either 0 or F in the off diagonals, depending on x's type

There are two exceptions where the Eidos behaviour doesn't completely match R, however.

I've attached a script that can be run in both R and Eidos interpreters that should give the same results (except for the above two cases).

Cheers, Nick

bhaller commented 9 months ago

Hi @nobrien97! Nice tests, I'll put them right into the Eidos unit tests. :->

This sounds great. Regarding the two exceptions, please do fix the first. As it is, Eidos is less stringent than R, so a user might take existing R code (that errors in that case), move it to Eidos, and it might behave incorrectly in that case (since it would no longer error). Since it's easy to fix, let's match R there. For the second exception, I think it's fine for Eidos to be more stringent than R, throwing an error in a questionable edge case that R accepts; so no action needed. I'll do a code review ASAP, perhaps later today. Anyhow, if I see any issues I'll probably just merge the PR and fix the issues on my side, no need for further back-and-forth. :-> That applies also to the first exception; I'll fix it on my end unless you get to it first. Thanks a bunch! It's very nice to receive a contribution like this.

bhaller commented 9 months ago

@nobrien97 I've got the diffs on my local machine now and am in the process of adding unit tests, doing final polish, etc., so please do not modify this PR further; it will get merged from my machine, not from this PR. Thanks again!

bhaller commented 9 months ago

Hey @nobrien97, a heads-up. I'm tightening the behavior of diag() a little compared to R; specifically, in the fourth usage case I am not allowing nrow/ncol to be specified at all. In R, it allows those parameters to be specified, and recycles/truncates x as needed to fill the matrix size given. In general, Eidos does not include recycling/truncation behavior that R allows, because I regard it as a bad design decision in R that is quite dangerous. Since the behavior of Eidos will be tighter than that of R, the user will get an error if they use the function in a way that works in R but not in Eidos, so it should be safe. Please let me know if this decision will cause you difficulties, since you're the person who actually plans to use this function! :->

nobrien97 commented 9 months ago

Hi @bhaller, no worries: it makes sense to remain consistent on recycling in Eidos!

bhaller commented 9 months ago

@nobrien97 The final commit is at https://github.com/MesserLab/SLiM/commit/20bb46ce02b2aefbc265201b31a1e54d79ba2c62, if you are interested in seeing how things evolved. It ended up being stricter than R in several respects, but I doubt that any of them are important to anyone. R sometimes allows really weird parameters and does really weird things with them, like diag(T) vs. diag(F) vs. diag(1.5). All of those are errors in Eidos. :-> Thanks again; a good addition.

bhaller commented 9 months ago

Uh, and https://github.com/MesserLab/SLiM/commit/21fcf5746e710968ad7155d4647d51f88241e129

nobrien97 commented 9 months ago

Thanks for the review @bhaller! When I was going through those edge cases in R I definitely thought the type coercing in case 3 was a strange choice.