dfm / george

Fast and flexible Gaussian Process regression in Python
http://george.readthedocs.io
MIT License
445 stars 128 forks source link

[documentation] meaning of HODLRSolver parameters: min_size and tol #158

Closed syrte closed 1 year ago

syrte commented 1 year ago

Hi Dan, thanks for the superb algorithm and package! I am confused about the meaning of the HODLRSolver parameters min_size=100, tol=0.1, seed=42. As far as I noticed, tol=0.1 may lead to some difficulties in optimizing the parameters sometimes, and a smaller tol e.g., 1e-3 can help.

I confess that I didn't go through the original algorithm paper (I skimmed quickly, but the answer is not obvious to me). But I did search through the repo, issues, and document. What do they mean and when should I change the default numbers? Maybe it would be helpful for the users if adding some more explanation in the documentation. Thanks again!

syrte commented 1 year ago

My apology! I found the explanation in solvers/_hodlr.cpp.

    min_size (Optional[int]): The block size where the solver switches to a
        general direct factorization algorithm. This can be tuned for platform
        and problem specific performance and accuracy. As a general rule,
        larger values will be more accurate and slower, but there is some
        overhead for very small values, so we recommend choosing values in the
        hundreds. (default: ``100``)
    tol (Optional[float]): The precision tolerance for the low-rank
        approximation. This value is used as an approximate limit on the
        Frobenius norm between the low-rank approximation and the true matrix
        when reconstructing the off-diagonal blocks. Smaller values of ``tol``
        will generally give more accurate results with higher computational
        cost. (default: ``0.1``)
    seed (Optional[int]): The low-rank approximation method within the HODLR
        algorithm is not deterministic and, without a fixed seed, the method
        can give different results for the same matrix. Therefore, we require
        that the user provide a seed for the random number generator.
        (default: ``42``, obviously)

Nevertheless, it might be nice to put these into the python docstring as well.

dfm commented 1 year ago

Thanks! Yes - would you be interested in opening a PR with this info added as a docstring to the HODLRSolver:

https://github.com/dfm/george/blob/769a71365d7f89fd9fc7ba9d38e8b5f648350aef/src/george/solvers/hodlr.py#L13

syrte commented 1 year ago

sure. will do it later this week. done.