flatironinstitute / gp-shootout

Benchmark and compare large-scale Gaussian process regression methods in 1D, 2D, and 3D, from MATLAB
Other
5 stars 0 forks source link

How to compute F^*F efficiently? #3

Closed yangpl closed 1 year ago

yangpl commented 1 year ago

Hi Phillips and Alex,

I am looking at this interesting project for Kriging interpolation to do 2D surface gridding. I start with 1D code efgp1d.m. According to your paper (Equaispaced Fourier representations ...) on arxiv, we know that X^X=D^F^ FD, so Gf=F^F accordingly. However, I could not understand lines 56-58 on building this matrix Gf. Why it is equivalent to NUFFT for computing that using a vector of all 1 once? Could you explain it a bit by relating with the equation 24? According to eqn 24, should we repeat NUFFT for each element, implying we have to repeat M*M times to form Gf?

In 2D code efgp2d.m, the function getGf follows the same strategy and I could not figure it out.

Hope for your reply! Pengliang

ahbarnett commented 1 year ago

Dear Pengliang, Sorry, the code is not the clearest, since the names don't match the preprint. But it's not hard. To check that multiplying F^*F of (24) on a vector is the same as convolution by v in (25) is simple algebra. You may enjoy my slides here: https://users.flatironinstitute.org/~ahb/talks/EFGP22.pdf The vector v defined by (25) is called XtXcol in the code, and the finufft1d1 call in l.57 evaluates (25). Note all the strengths are 1. Gf is then \hat{v}. Afun2 implements Alg 5 (steps 3-6). Hope this helps! Best, Alex

On Thu, Jan 19, 2023 at 7:34 AM Pengliang Yang @.***> wrote:

Hi Phillips and Alex,

I am looking at this interesting project for Kriging interpolation to do 2D surface gridding. I start with 1D code efgp1d.m. According to your paper (Equaispaced Fourier representations ...) on arxiv, we know that X^X=D^ F^ FD, so Gf=F^F accordingly. However, I could not understand lines 56-58 on building this matrix Gf. Why it is equivalent to NUFFT for computing that using a vector of all 1? Could you explain it a bit by relating with the equation 24?

In 2D code efgp2d.m, the function getGf follows the same strategy and I could not figure it out.

Hope for your reply! Pengliang

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/gp-shootout/issues/3, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRST5IEVTAZ3KGNAD23TWTEYDVANCNFSM6AAAAAAUAIZH6E . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

yangpl commented 1 year ago

Hi Alex, Thanks for the reply. I understand the NUFFT operation in operator form now. Gf implements the matrix vector product (F^*F applied to v, therefore corresponding to its Fourier transform)

I did take a look at the presentation. The main idea by converting a large NN system to a small MM system via low rank approximation was indeed helpful. Still one confusion: since the number M is very small, the new M*M system can be built by explicitly computing each element of the coefficients, and then stored in memory/cache. We can then consider direct solve via LU decomposition or Cholesky factorization (thanks to symmetric positive definite property). This kind of factorization can be done only once, and then reused for each test point. This can be very useful for Kriging interpolation since the number of test points are significantly large. I thought this can be more interesting and efficient than iterative solution via conjugate gradient method. Using CG in this case involves repeating the same calculation many times. Why direct solve was not considered in the paper?

Best, Pengliang

ahbarnett commented 1 year ago

Good idea. Philip already thought of it: https://arxiv.org/abs/2109.14081 :)

But our point is when M>>1e4 you need to iterate... Cheers, Alex

On Thu, Jan 19, 2023 at 7:45 PM Pengliang Yang @.***> wrote:

Hi Alex, Thanks for the reply. I understand the NUFFT operation in operator form now.

I did take a look at the presentation. The main idea by converting a large NN system to a small MM system via low rank approximation was indeed helpful. Still one confusion: since the number M is very small, the new M*M system can be built by explicitly computing each element of the coefficients, and then stored in memory/cache. We can then consider direct solve via LU decomposition or Cholesky factorization (thanks to symmetric positive define property). This kind of factorization can be done only once, and then reused for each test point. This can be very useful for Kriging interpolation since the number of test points are significantly large. I thought this can be more interesting and effcient than iterative solution via conjugate gradient method. Using CG in this case involves repeating the same calculation many times. Why direct solve was not considered in the paper?

Best, Pengliang

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/gp-shootout/issues/3#issuecomment-1397791085, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSRK4FYDEMIG5QOXIA3WTHNZFANCNFSM6AAAAAAUAIZH6E . You are receiving this because you commented.Message ID: @.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

yangpl commented 1 year ago

Thanks, Alex. The choice of M depends on our accuracy requirement from \epsilon. If M>>1e4, does it mean that truncating the integration by limited number of terms makes a poor approximation? In other words, Fourier representation is not sufficient for such a kernel function? Best, Pengliang

yangpl commented 1 year ago

Hi Alex. I just realized that we should connect the accuracy requirement with both h and M (the interval h and the total number of intervals M to discretize the integral). I should not separate them. So the question is then: under the same accuracy requirement, can we always try to use a larger h such that the number of terms M can be smaller? Basically my idea is to never go to a situation M>>1e4, for example?

ahbarnett commented 1 year ago

Our preprint describes exactly how to choose h and then M, for certain accuracy. In 1D you will not often need M>>1e4. But in 2D or 3D it is common, and we go to M~1e7 I think. Best, Alex

On Fri, Jan 20, 2023 at 12:09 AM Pengliang Yang @.***> wrote:

Hi Alex. I just realized that we should connect the accuracy requirement with both h and M (the interval h and the total number of intervals M to discretize the integral). I should not separate them. So the question is then: under the same accuracy requirement, can we always try to use a larger h such that the number of terms M can be smaller? Basically my idea is to never go to a situation M>>1e4, for example?

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/gp-shootout/issues/3#issuecomment-1397929376, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSVZ5HM4Z3PYCZQ3QTLWTIMW3ANCNFSM6AAAAAAUAIZH6E . You are receiving this because you commented.Message ID: @.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942