FastNFT / FNFT

Fast numerical computation of (inverse) nonlinear Fourier transforms
https://fastnft.github.io/FNFT/
GNU General Public License v2.0
42 stars 12 forks source link

Forward NFT for KdV #64

Closed PJ-Prins closed 3 years ago

ShrinivasJC commented 3 years ago

Documentation needs to be updated.

The mex file for KdV also needs to be updated to allow for never discretizations. Maybe we can add another C and MATLAB example for KdV or at least update the current examples.

wahls commented 3 years ago

Nice, everything seems to work as expected. A few comments:

PJ-Prins commented 3 years ago
  • Unless there is a very good reason too keep it, I would propose to revert commit d67c4f5 in which code specific to the KdV is added to akns_scatter.

That commit makes the behaviour of akns_scatter the same regardless of the PDE. It's semantically correct for NSE as well. It just happens to be a similarity transformation by identity in that case, so that you would be able to cut a corner as long as the code were only for NSE and not for AKNS in general. Without this addition the name should be something like akns_change_of_state to describe adequately what it returns.

However, the most important reason for me to prefer it like this is that the alternative would require putting the ugliness of lines 89--149 back in fnft__kdv_scatter.c, which is not only hard to read, but also results in a lot of unnecessary value copying (unless the compiler manages to iron that out somehow).

  • Is there any noticeable difference between misc_l2norm2 and misc_l2norm2_full? In any case, I'd suggest to keep only one.

misc_l2norm2 truncates the input first by half a sample at the front and half a sample at the end before calculating the L2 norm. That truncation should for sure not take place for the application of the 'Parseval' relations. However misc_l2norm2 is also used in misc_resample. I don't know if the truncation is correct and/or necessary there, so I have to pass that question on to @ShrinivasJC.

  • commit dbc8896: misc_mat_mul_proto was not indented to be called directly; instead the idea is to create a new function of the form fnft__misc_matrix_mult_MxN (with M,N actual numbers) so that the compiler can take the a priori known matrix size into account. Were there any special reason for not doing that here?

With fixed matrix sizes we would need to add 5 separate functions, namely for 2x2 times 2x2, 4x4 times 4x4, 2x2 times 2x4, 2x2 times 2x1 and 4x4 times 4x1. Since the matrix size inputs to fnft__misc_matrix_mult are constants, and since the function is only used with hard coded inputs (for the sizes), the compiler still knows the sizes a priory. Also note that there is no memory allocation inside fnft__misc_matrix_mult that would slow down in the case of variable sizes. A further consideration for this implementation is that it prevents the repeated calls to memcpy, one for every sample of the potential.

I've compared the speed between the implementation before this pull request and a version after this commit and it appears to make no measurable difference. Hence, the implementation with 5 separate functions has no advantage regarding the performance, whereas it does have the disadvantages of code copying.

PJ-Prins commented 3 years ago
  • commit dbc8896: misc_mat_mul_proto was not indented to be called directly; instead the idea is to create a new function of the form fnft__misc_matrix_mult_MxN (with M,N actual numbers) so that the compiler can take the a priori known matrix size into account. Were there any special reason for not doing that here?

With fixed matrix sizes we would need to add 5 separate functions, namely for 2x2 times 2x2, 4x4 times 4x4, 2x2 times 2x4, 2x2 times 2x1 and 4x4 times 4x1. Since the matrix size inputs to fnft__misc_matrix_mult are constants, and since the function is only used with hard coded inputs (for the sizes), the compiler still knows the sizes a priory. Also note that there is no memory allocation inside fnft__misc_matrix_mult that would slow down in the case of variable sizes. A further consideration for this implementation is that it prevents the repeated calls to memcpy, one for every sample of the potential.

I've compared the speed between the implementation before this pull request and a version after this commit and ~it appears to make no measurable difference.~ Hence, the implementation with 5 separate functions has no advantage regarding the performance, whereas it does have the disadvantages of code copying.

Correction: I've tried to reproduce that test, but got a different result. (Probably there was a problem with multiple versions in the Matlab path last time.) It now appears that 1a69d18 caused a 7 to 15% longer runtime for all slow NSE algorithms. Adding and using separate multiplication functions for each matrix size to 07bd054 doesn't make any difference, though. I'll search for the cause and solution and get back to this.

wahls commented 3 years ago
  • Unless there is a very good reason too keep it, I would propose to revert commit d67c4f5 in which code specific to the KdV is added to akns_scatter.

That commit makes the behaviour of akns_scatter the same regardless of the PDE. It's semantically correct for NSE as well. It just happens to be a similarity transformation by identity in that case, so that you would be able to cut a corner as long as the code were only for NSE and not for AKNS in general. Without this addition the name should be something like akns_change_of_state to describe adequately what it returns.

Ah, I see. That actually depends on what you expect akns_scatter to do: Solve the AKNS scattering problem or return a scattering matrix of the form [a b' ; b a']. I expect it to do the former, even though with KdV in mind that indeed is inconsistent with the current documentation (which thus should be updated). However, it is consistent with what akns_fscatter actually returns.

The advantage of this approach is that you only have to understand the AKNS system to maintain this function. With the changes, you would also need to understand things specific to KdV to maintain this function. It would reduce the modularity of the code.

However, the most important reason for me to prefer it like this is that the alternative would require putting the ugliness of lines 89--149 back in fnft__kdv_scatter.c, which is not only hard to read, but also results in a lot of unnecessary value copying (unless the compiler manages to iron that out somehow).

Unless you know already have an idea of what it does, the updated version is not that much easier. I did a quick test, the copying does not seem to have a noticeable influence on the performance. (By the way, we also chose to have the user to prepare the r values outside of the scatter functions to avoid that the function has to handle PDE specific details even though it means additional allocations.)

  • Is there any noticeable difference between misc_l2norm2 and misc_l2norm2_full? In any case, I'd suggest to keep only one.

misc_l2norm2 truncates the input first by half a sample at the front and half a sample at the end before calculating the L2 norm. That truncation should for sure not take place for the application of the 'Parseval' relations. However misc_l2norm2 is also used in misc_resample. I don't know if the truncation is correct and/or necessary there, so I have to pass that question on to @ShrinivasJC.

Did you notice that it just implemented the trapezoidal rule? The error should be close to the midpoint rule, but the advantage is that is easier to us (the integration boundaries do not have to be extended).

But in any case, inside misc_resample it is only used for triggering a warning. The percentage used in there is already ad-hoc, so it should not make a difference.

  • commit dbc8896: misc_mat_mul_proto was not indented to be called directly; instead the idea is to create a new function of the form fnft__misc_matrix_mult_MxN (with M,N actual numbers) so that the compiler can take the a priori known matrix size into account. Were there any special reason for not doing that here?

With fixed matrix sizes we would need to add 5 separate functions, namely for 2x2 times 2x2, 4x4 times 4x4, 2x2 times 2x4, 2x2 times 2x1 and 4x4 times 4x1. Since the matrix size inputs to fnft__misc_matrix_mult are constants, and since the function is only used with hard coded inputs (for the sizes), the compiler still knows the sizes a priory. Also note that there is no memory allocation inside fnft__misc_matrix_mult that would slow down in the case of variable sizes. A further consideration for this implementation is that it prevents the repeated calls to memcpy, one for every sample of the potential.

I've compared the speed between the implementation before this pull request and a version after this commit and it appears to make no measurable difference. Hence, the implementation with 5 separate functions has no advantage regarding the performance, whereas it does have the disadvantages of code copying.

Somehow I missed that there functions where moved into the header file earlier (for no good reason apparently, since the commit mentions that there was no performance gain).

PJ-Prins commented 3 years ago
  • Unless there is a very good reason too keep it, I would propose to revert commit d67c4f5 in which code specific to the KdV is added to akns_scatter.

That commit makes the behaviour of akns_scatter the same regardless of the PDE. It's semantically correct for NSE as well. It just happens to be a similarity transformation by identity in that case, so that you would be able to cut a corner as long as the code were only for NSE and not for AKNS in general. Without this addition the name should be something like akns_change_of_state to describe adequately what it returns.

Ah, I see. That actually depends on what you expect akns_scatter to do: Solve the AKNS scattering problem or return a scattering matrix of the form [a b' ; b a']. I expect it to do the former, even though with KdV in mind that indeed is inconsistent with the current documentation (which thus should be updated). However, it is consistent with what akns_fscatter actually returns.

The advantage of this approach is that you only have to understand the AKNS system to maintain this function. With the changes, you would also need to understand things specific to KdV to maintain this function. It would reduce the modularity of the code.

Indeed, you could avoid that call by changing akns_scatter_matrix() to akns_change_of_state_matrix(), but then you still need exactly the same calls in akns_scatter_bound_states() since the 1,1-entry of the change of state matrix in the basis you use must be 0 at an eigenvalue in order to use the bidirectional algorithm. That would create an inconsistency within the same file. Furthermore, the KdV and NSE specific transformation matrices need to stay in fnft__akns_discretization.c since only the akns_discretization is available in akns_scatter_bound_states(), so removing it from akns_scatter_matrix() doesn't solve your conceptual point. All that a maintainer of that file should know is that in general a similarity transformation is needed to find the solution of the AKNS scattering problem, since the assumption just above eq. (3.1) in the AKNS paper does not always hold. That can be clarified with a few extra comments.

What akns_fscatter returns remains different either way. To make it consistent, the polynomials would have to be evaluated in akns_fscatter before return. However, that is not being done for NSE specific reasons, namely because nsev.c needs the polynomials for finding the eigenvalues. From the perspective of kdvv.c it's an unnecessary complication.

  • Is there any noticeable difference between misc_l2norm2 and misc_l2norm2_full? In any case, I'd suggest to keep only one.

misc_l2norm2 truncates the input first by half a sample at the front and half a sample at the end before calculating the L2 norm. That truncation should for sure not take place for the application of the 'Parseval' relations. However misc_l2norm2 is also used in misc_resample. I don't know if the truncation is correct and/or necessary there, so I have to pass that question on to @ShrinivasJC.

Did you notice that it just implemented the trapezoidal rule? The error should be close to the midpoint rule, but the advantage is that is easier to us (the integration boundaries do not have to be extended).

It's the other way around. The trapezoidal rule integrates only from T[0] til T[1], whereas the scattering problems are being solved from a half step size before T[0] til half a step size above T[1]. (cf. fnft__akns_discretization_boundary_coeff() returning 0.5 for every discretization.) To approximate the integral of that potential (or get the exact correct result in case of the BO discretizations) , we would need to pad with a 0-sample before and after the array, and call misc_l2norm2 with that extended array (or resample the whole potential on a grid from half step size before T[0] til half a step size above T[1]). The result of that is however equal to the midpoint rule, which doesn't require zeropadding.