JuliaDiff / TaylorDiff.jl

Taylor-mode automatic differentiation for higher-order derivatives
https://juliadiff.org/TaylorDiff.jl/
MIT License
73 stars 8 forks source link

Ability to deal with matrix input #47

Closed mBarreau closed 1 year ago

mBarreau commented 1 year ago

This PR aims at adding the possibility to deal with matrix input since this can be helpful in the context of PINN. Consequently, here are the main modifications:

tansongchen commented 1 year ago

Could you explain mathematically what you are trying to do here? We might have more general solution to this...

mBarreau commented 1 year ago

@tansongchen Sure, what I want to do is simple. Let CodeCogsEqn Then CodeCogsEqn (1)

Is there a reason for the types in derivative for x and l to be the same? If both are independent subclasses of AbstractVector{T} that would allow for more freedom in the syntax :)

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.29% :tada:

Comparison is base (7979d76) 85.18% compared to head (6ad719f) 85.48%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #47 +/- ## ========================================== + Coverage 85.18% 85.48% +0.29% ========================================== Files 6 6 Lines 243 248 +5 ========================================== + Hits 207 212 +5 Misses 36 36 ``` | [Files Changed](https://app.codecov.io/gh/JuliaDiff/TaylorDiff.jl/pull/47?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaDiff) | Coverage Δ | | |---|---|---| | [src/derivative.jl](https://app.codecov.io/gh/JuliaDiff/TaylorDiff.jl/pull/47?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaDiff#diff-c3JjL2Rlcml2YXRpdmUuamw=) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mBarreau commented 1 year ago

@tansongchen , tests pass now. Is that fine with you?

tansongchen commented 1 year ago

Thanks for contributing that, but I'm still preparing for an exam these days 😂 I will get back to you and take a closer look on Friday or this weekend!

tansongchen commented 1 year ago

Let me try to understand the point of these additional differentiation APIs.

Currently, there are two methods, derivative(f, x0, order) calculates the higher-order derivative

$$ \frac{\mathrm d^kf}{\mathrm dx^k}\big |_{x_0} $$

and derivative(f, x0, l, order) calculates the higher-order directional derivative in direction l

$$ \frac{\partial^kf}{\partial l^k}\big |_{x_0} $$

And now, you add two additional methods, which say that,

  1. For a 1-by-N matrix input x, the function should calculate the derivative at each of its components, and then assemble the output back to a 1-by-N matrix;
  2. For a M-by-N matrix input x, and a M-sized vector l, the function should calculate the directional derivative at each of its columns, and then assemble the output back to a 1-by-N matrix;

Is this correct? If is, I'm happy with this kind of shorthand notations, as long as they proved handful in PINN applications. But I would prefer to not use Union types and move the new APIs to a new block, as well as add some comments stating that they are just shorthands for multiple calculations, or they might be confused with matrix derivatives (see https://en.wikipedia.org/wiki/Matrix_calculus ). If you agree on that, I will take care of moving the code and adding comments, and then merge.

mBarreau commented 1 year ago

Hi,

First of all, you are totally correct with what I aim to do.

Let me justify it. You define the Flux/Lux model and you apply it to the input, and then you associate it with the targets and build your loss function. The idea is to do the same with the physics residual. Since the rand function output a 1xN matrix, it is very convenient to define a residual model which behaves as the original model (input nXN and output MxN) such that you can build the loss in the exact same way. This is then even simpler to resample or build more complex loss.

If you agree, then I would definitely support such an idea :) (and even write a small tutorial to show how easy it gets to write complex pinn using taylordiff).

tansongchen commented 1 year ago

Just did some cleaning up work and added some comments! Once CI passes I will merge. Thanks again for contribution

mBarreau commented 1 year ago

@tansongchen, can I ask why you write AbstractMatrix{T} where T <: Number And not AbstractMatrix{<:Number}? The second option looks simpler to read and shorter.

tansongchen commented 1 year ago

They are equivalent when there is only one type parameter and the parameter is not used in function body. However, when there are two or more, explicit variable name would help to tell whether two types can be different or not. Also, in make_taylor function the type parameter is used for explicit conversion.

So to keep consistency with more complicated cases, I would personally prefer to write all type parameter as a variable :)