Open bennn opened 1 year ago
The problem is that you are trying to fit a second-degree polynomial with only 2 points. Depending on how you look at it, this either has no solution or has an infinite number of solutions :-)
You can either use a linear fit, or specify the degree of the polynomial (which defaults to 2)
(df-least-squares-fit df "x" "y" #:mode 'linear)
(df-least-squares-fit df "x" "y" #:mode 'polynomial #:polynomial-degree 1)
I agree, however that the error message could be improved.
Is the general rule that the degree can't be smaller than the number of points?
Is the general rule that the degree can't be smaller than the number of points?
You need at least N + 1 data points to fit a polynomial of degree N. This is because for a N degree polynomial, there are N + 1 coefficients, so you need N + 1 equations, in the form y = P(x) to find these coefficients. This is easiest to understand for a first-degree polynomial which is a line:
Unfortunately, we cannot simply add a contract to check that the polynomial degree must be less than the number of points: we need N + 1 distinct data points. For example, you cannot find a "least squares fit" line for a data set that contains the 3 points (0, 0), (0, 0), (0, 0).
A contract for distinct points would not work either: Depending on how matrix solve algorithms are implemented in the racket/math library, it might be possible to have points which are different (i.e., not equal in the =
or fl=
sense), yet close to each other so that the matrix cannot be inverted, and a solution cannot be found.
Hmm, ok. What about catching exceptions from math
and re-raising them with a brief message like "unable to solve, may be too few distinct points in the dataset" ?
a brief message like "unable to solve, may be too few distinct points in the dataset" ?
Such a message might confuse the user, and I had too many bad experiences with such messages. If an exception is to be thrown, I would rather have the original exception, rather than a higher level routine's interpretation of it -- for example the exception might not be related to "solving" at all.
Here is a dataset with two clearly distinct data points that will fail to find a linear fit:
#lang racket
(require data-frame)
(define df
(let* ((df (make-data-frame))
(xs (make-series "x" #:data '#(0 "1")))
(ys (make-series "y" #:data '#(2 "3"))))
(df-add-series! df xs)
(df-add-series! df ys)
df))
(df-least-squares-fit df "x" "y" #:mode 'polynomial #:polynomial-degree 1)
While the above example is somewhat contrived, this situation can easily happen in real life: some programs export data as CSV files and sometimes they quote numbers inside these files. For such CSV files, numbers will he read as strings, and the user needs to pass #t
to #:quoted-numbers?
in df-read/csv
to read them as numbers.
Oh, by re-raise I meant to keep the original exception message. In other words, throw the actual exception and add a commentary on top.
What program did you run?
What happened?
What did you expect to happen?
No error.
I'm not sure what the problem is. Too few points? I would rather have a poor fit (straight line) than an error. But if an error is the right thing, can we clear up the message?