akrodger / brams-math-methods

A collection of math computation C libraries written by Abram "Bram" Rodgers.
GNU General Public License v3.0
0 stars 0 forks source link

Interpolate function bug #1

Open akrodger opened 7 years ago

akrodger commented 7 years ago

It would appear that as the number of data samples for a particular polynomial increase, the polynomials tend to diverge from the actual x-y planar position of the data set. Take a serious look at this function and consider rewriting it in the future. (Bug was found when using the function to curve-fit some data for a lab report.)

akrodger commented 7 years ago

So some further investigation was done on this function. The "bug" appears when I attempt to curve fit to something that isn't actually a polynomial. In the case above, I was trying to fit a quadratic and a cubic to some lab data I collected in an optics class. The bottom line is that if you have a smooth analytic curve, then a polynomial may be able to fit that line. However, in order to get the right coefficients on that polynomial, you need a large degree of accuracy. (I found that more than 10 decimal places along with everything left of the decimal was frequently required.)

On top of that, as I have it programmed now, it only compares some of the possible interpolated polynomials, not all of them. Hypothetically (this has yet to be demonstrated by my code) if I need a polynomial of order n to fit some number of points m >= m, then I can take every possible mCn (m choose n, the binomial coefficient, also known as possible combinations) and take the arithmetic mean of the results to get a good curve. In interest of testing it quickly though, I only coded it to look at sequential subsets of the data given. So instead of looking at mCn, I am comparing only (m - n). This is a massive statistical difference and, as is not surprising from this analysis, we see a somewhat large divergence from any actual best fit curve. I do get close though.

However, if you use literal polynomial outputs on this function (exact answers) then it gives you exactly the polynomial you pulled from. This leads me to believe the above explanation is the key source of error. Plan to fix this code is figure out of to compare combinatorial subsets of the x-y data pairs without having to resort to recursion and blow up memory usage.

akrodger commented 7 years ago

There was a large overhaul of this function, being split up into 2 versions. interpolate() and meanInterpolate(). (See header files for details.) It may actually be something wrong in the linear algebra operations. For now, the only known issue is that interpolate() blows up its output when after for a polynomial larger than about 10 terms. One could argue that 10 terms doesn't really matter, but this really should be as robust as possible. I suspect there are some memory issues with the invert(), rref(), or ref() functions. Perhaps having to do with very large matrices which I did not initially use it for? The plan to investigate these issues is to build an easy to use GUI then test a large range of big matrix values.

akrodger commented 7 years ago

So I did more testing of this an found that the issue lies with my implementation of the ref() function. I will refer to the types of square matrices used in Lagrange Interpolation as a Lagrange Matrix.

A characteristic feature of a Lagrange Matrix is that the bottom rightmost column has numbers raised to rather large powers. Namely, the jth column is a number raised to the jth power. A polynomial of order n generally outputs numbers on the scale of (input)^n. So really, an input to this Lagrange Interpolator will be something like x^(n+j). For high enough n and j and x > 1, these numbers blow up pretty fast. This is generally not an issue for generating the matrix or calculating the Row Echelon Form, but once you try to invert it, you suddenly have a matrix with entries on the scale of approximately X^(-n-j). These numbers are extremely tiny and come off as approximately zero when doing floating point rounding. Even with over the top levels of precision, in the machine, we still see extremely minor floating point rounding deviation. This is enough to completely throw off the Lagrange Interpolation calculations, which need appropriately high precision to operate.

In general, the interpolator appears stable (both the mean and regular versions) for polynomials of degree 10 or less. I would need a more robust computer for better calculation that what I have now.

PS: Changed to wontfix since this is not really a bug so much as a limitation of my laptop.