friendly / matlib

Matrix Functions for Teaching and Learning Linear Algebra and Multivariate Statistics
http://friendly.github.io/matlib/
65 stars 16 forks source link

Bordering a matrix #63

Open friendly opened 2 months ago

friendly commented 2 months ago

In dev/bordermatrix.Rmd are some tests of the idea of bordering a matrix with rownames() and colnames() assigned to a matrix, the idea being to be able to render something like this:

image

This file defines a borderMatrix() function, but there's something wrong with my tests, because it relies on defining a LaTeX \bordermatrix{ {row} {col} {body} } macro. Can someone take a look?

john-d-fox commented 2 months ago

I looked at this and I'm afraid that I don't have anything to suggest, but if you do figure out how to produce the correct LaTeX code, I have a suggestion for how to implement it.

When I first wrote symbolicMatrix(), the precursor to latexMatrix(), I thought about including rownames and colnames arguments, but didn't because I didn't know how to show the row/column names in LaTeX. Then the idea got lost with all of the other details that followed.

Anyway, I think that the natural way to support row/column names is to add rownames and colnames slots to "latexMatrix" objects. These slots could either be filled by corresponding arguments in latexMatrix() or via a method for dimnames<-(). Similarly, there would be a dimnames() accessor method. Finally, if the rownames or colnames slots are non-NULL, the "latexMatrix" print() method would take care of producing the necessary LaTeX markup. That seems to me a clean, simple implementation that follows standard R conventions.

Of course, all this is predicated on figuring out how to produce the correct LaTeX.

john-d-fox commented 2 months ago

Taking another look at what you tried to do, what's wrong with producing LaTeX code like the following (from your example)?

$$
\begin{matrix}
  &  \begin{matrix} A & C & G & T    % row labels
     \end{matrix} \\
  \begin{matrix}                     % col labels
      \alpha \\ 
      \beta \\ 
      \gamma \\ 
      \epsilon
  \end{matrix}  &  
    \begin{pmatrix}                  % body
    1 & 2 & 3 & 4\\
    3 & 4 & 5 & 6\\
    3 & 4 & 5 & 6\\
    3 & 4 & 5 & 6
  \end{pmatrix}\\
\end{matrix}
$$

That is, why can't print.latexMatrix() just wrap \begin{pmatrix} ... \end{pmatrix} in \begin{matrix} etc. ...end{matrix}`?

friendly commented 2 months ago

The LaTeX code you used works perfectly.

Are you suggesting to: (a) Add possibility of rowname, colname slots to latexMatrix() (b) Use these inside print.latexMatrix() as you suggest above?

That would avoid need for a special bordermatrix

john-d-fox commented 2 months ago

Yes, that's exactly what I'm suggesting (and, in case it wasn't clear, the LaTeX code in my example is yours!).

One ambiguous point: I assume that I should accommodate "m" x "n" matrices (i.e., matrices with non-numeric dimensions), which slightly complicates the issue, though I think I see how to handle it.

If it's OK with you and Phil, I'll implement this when I have time to do it, maybe later today.

friendly commented 2 months ago

Yes, that sounds fine. For the m x n case, I don't see a clear need for bordering, but if you see a solution, go ahead.

My main use case came from the linear hypothesis example where one might want to have labels for the columns, giving variable names and labels for rows, naming contrasts.

john-d-fox commented 2 months ago

I added the file dev/row-col-names-tests.R with a modified implementation of latexMatrix() and print.latexMatrix(), and a new dimnames.latexMatrix() method. The print() method is imperfect, in that the alignment of column names breaks down if there are too many columns (see the examples in the file), and the effort the function makes to align the column names with the columns is awkward.

I think that this is the best I can without getting bordermatrix to work. If it's satisfactory, I'll complete the implementation; otherwise, I'll abandon it for now.

friendly commented 2 months ago

I think that looks great, given that it is doing this w/o bordermatrix. Even in your example C with 8 columns the misalignment isn't too bad.

john-d-fox commented 2 months ago

I have some ideas for making the alignment of the labels more robust. If those work out, I'll update the code and move it to the package. It's too bad that we can't use bordermatrix.

friendly commented 2 months ago

Now with this available for latexMatrix, what about Eqn()?

Z <- latexMatrix(prefix="\\sqrt{", suffix="}",
                 rownames=c("\\alpha_1", "\\alpha_2", "\\alpha_m"),
                 colnames=c("\\beta_1", "\\beta_2", "\\beta_n"))
Eqn(Z)

renders this w/o row/col labels

Eqn calls latexMatirx() as follow2 (line 164)

  for(i in 1L:length(dots)){
    if(is.matrix(dots[[i]])){
        print( do.call(latexMatrix, c(list(symbol=dots[[i]]), mat_args)) )
john-d-fox commented 2 months ago

Is it relevant that the code for supporting row and column names isn't yet in the matlib package code?

I've committed some improvements to dev/row-column-names-tests.R, along with support for modifying dimnames. I can move this to R/latexMatrix.R when I have time, which may not be today.

friendly commented 2 months ago

I'm testing the version in dev/ via source(), so it doesn't matter right now. Might as well wait til the dust settles.

friendly commented 2 months ago

Here's a new test case. I won't modify row-col-names-test.R, cause you may be working on it. Longer col names mess things up, but wider cell entries can compensate

mat <- matrix(sample(25), nrow = 5,
              dimnames = list(who = c("Abe", "Bart", "Cat", "Doug", "Eve"),
                              what = c("Geog", "Hist", "Math", "Read", "Spell")))

# should latexMatrix recognize that it has row/col names?
latexMatrix(mat)

# alignment of col labels quite off
latexMatrix(mat, rownames = rownames(mat), colnames = colnames(mat))

# make cell entries wider
latexMatrix(1000 * mat, rownames = rownames(mat), colnames = colnames(mat))
john-d-fox commented 2 months ago

With respect to the first point: Sure, latexMatrix() could easily be made to use non-NULL dimnames in an existing R matrix. I'll do that.

The second point is much trickier, but I might be able to do it. At present, print.latexMatrix() adjusts the width of column labels, which works pretty well in most cases if the labels aren't wider than the matrix entries. I could adjust the width of the entries if they are narrower than the column labels.

In figuring out the number of characters in a string, the code treats something of the form \word{text} as the same character length as text plus 1 character, so, e.g., \sqrt{x} is treated as 2 characters; similarly, e.g., \beta is treated as a single letter. Something like x_{ij}^2 is treated as 4 characters (from xij2).

The way your example is typeset, the column (and row) names are rendered in math mode as multiplication of several italic variables and not correctly as a word. Something like \mathrm{label} would be required to get it right. I think that label would be properly handled as 5 characters by my current approach.

I'll give this a try, but not right now.

friendly commented 2 months ago

Given that mathjax doesn't support the kerning needed to make this perfect, we may need to decide what is 'good enough'

Anyway, here's my last example using \text{}

latexMatrix(10000*mat, 
            rownames = paste("\\text{",rownames(mat),"}"), 
            colnames = paste("\\text{",colnames(mat),"}"))

image

Not perfect, but not too terrible

john-d-fox commented 2 months ago

Take a look at the latest version, just uploaded. I have to stop now since we're about to have guests for dinner.

john-d-fox commented 1 month ago

Last night, I implemented support for dimnames in matmult() (not uploaded to GitHub). I also thought about the other matrix operations and concluded that all were straightforward except for the Kronecker product, which I figured I could skip. The convoluted code that Michael found also makes me want to avoid dealing with it.

dimnames are implemented a bit differently for "latexMatrix" objects than in standard R. For example, I don't require that dimension names be unique; dimnames is a slot, not an attribute; and the slot always contains a two-element list even if both elements are NULL. I haven't implemented support for dimnames in indexing, though it probably wouldn't be hard to do that, even if they aren't required to be unique. OTOH, if there's a compelling reason to do so, we could enforce uniqueness.

friendly commented 1 month ago

Yeah, skip Kronecker. I see no reason to enforce row/col names being unique, or even valid 'names'. There's a compelling reason to do this for data.frames, but here, the values are just labels.

I'm almost starting to regret starting the 'bordering a matrix' thread. The general case is interesting, and we've taken it as far as perhaps we should. Let's get this done.

john-d-fox commented 1 month ago

I've now implemented everything that we discussed and moved it to GitHub under R/. The kronecker() method prints a message if there are row or column names and then ignores them.

Row and column names now must be unique because that made it very easy to make indexing by names work.

Of course, it would be a good idea to do some more checking beyond the examples in dev/row-col-names-tests.R and dev/operations-with-names.R to make sure that everything works as expected.

I'm sorry that you (almost) regret how complicated this became, but my preference was to make things work if they seemed reasonable. The only part that bothers me is the kludgy approach to making column labels (almost) line up. It would have been much easier if LaTeX bordermatrix had been supported.