ahmadtourei commented 2 months ago

Description

This PR addresses #396 by adding a new velocity_to_strain_rate patch function that changes data shape and supports higher gauge lengths.

We plan to combine the two of the velocity_to_strain_rate functions for a more general function in the future.

Checklist

I have (if applicable):

[ ] referenced the GitHub issue this PR closes.
[ ] documented the new feature with docstrings or appropriate doc page.
[ ] included a test. See testing guidelines.
[ ] your name has been added to the contributors page (docs/contributors.md).
[ ] added the "ready_for_review" tag once the PR is ready to be reviewed.

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.84%. Comparing base (b6edf30) to head (5db19fa).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #399 +/- ## ======================================= Coverage 99.84% 99.84% ======================================= Files 109 109 Lines 8788 8843 +55 ======================================= + Hits 8774 8829 +55 Misses 14 14 ``` | [Flag](https://app.codecov.io/gh/DASDAE/dascore/pull/399/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=DASDAE) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/DASDAE/dascore/pull/399/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=DASDAE) | `99.84% <100.00%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=DASDAE#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

d-chambers commented 2 months ago

I have an idea for making both order and gauge_length to work together. I will try to find time later this week to think more about it and work on it.

d-chambers commented 2 months ago

Ok @ahmadtourei,

If you have a chance take a look at this. It allows Patch.differentiate to use a step parameter. This then decimates the array a certain number of times, applies the central difference operator to each decimated array, and re-assembles everything at the end. velocity_to_strain_rate can then simply use the step argument now to get multiple gauge lengths.

I think it is probably worth keeping velocity_to_strain_rate_fd since it fundamentally does something a little bit different, but the outputs of both functions should be similar (expect small differences in the array shapes).

d-chambers commented 2 months ago

closes #399

ahmadtourei commented 2 months ago

Ok @ahmadtourei,

If you have a chance take a look at this. It allows Patch.differentiate to use a step parameter. This then decimates the array a certain number of times, applies the central difference operator to each decimated array, and re-assembles everything at the end. velocity_to_strain_rate can then simply use the step argument now to get multiple gauge lengths.

I think it is probably worth keeping velocity_to_strain_rate_fd since it fundamentally does something a little bit different, but the outputs of both functions should be similar (expect small differences in the array shapes).

It looks good to me but it seems the data outputs of both functions are not similar. I even noticed some changes in polarity for the same channels of the resulting array from each method. Here I'm including a test (that currently breaks) for further investigation.

    def test_compare_two_funcs(self, terra15_das_patch):
        g_m = 1
        out1 = terra15_das_patch.velocity_to_strain_rate_cd(gauge_multiple=g_m)
        out2 = terra15_das_patch.velocity_to_strain_rate(gauge_multiple=g_m)
        dist1 = int(np.floor(g_m / 2))
        dist2 = -int(np.ceil(g_m / 2))
        out2_selected = out2.select(distance=(dist1, dist2), samples=True)
        assert np.allclose(out1.data, out2_selected.data, rtol=1e-03, atol=1e-05)

Also, I think we need to change the name of the second function as it actually does a central differentiation. The reference below illustrates what I implemented in the second function (eq. 2): https://doi.org/10.1093/gji/ggae055

Therefore, I agree we need to keep both functions.

d-chambers commented 2 months ago

it seems the data outputs of both functions are not similar.

Yes, that is odd. Looking at the second function more closely, it should be doing exactly what the first function does when gauge_multiple = 1 and order =2, except the edges are handled differently as we already discussed, I will look into it more closely.

d-chambers commented 2 months ago

I have some insight.

Consider this simple grid of numbers turned into a patch with distance_step = 1 where distance corresponds to axis 1.

[ [8, 6, 7, 5], [3, 0, 9, 8], [5, 4, 4, 4], [9, 1, 1, 4]],

velocity_to_strain_rate (first function) returns a patch with:

[[ -3.5 -0.5 -0.5 -3.5] [ -9. 3. 4. -6. ] [ -1.5 -0.5 0. 0. ] [-12. -4. 1.5 4.5]]

velocity_to_strain_rate_fd (second function) returns a patch with:

[[-2. 1. -2.] [-3. 9. -1.] [-1. 0. 0.] [-8. 0. 3.]]

The first function is estimating the strain at a point x using the neighboring points:

$f'(x) = \frac{f(x + dx) - f(x - dx)}{2dx}$

the second function is estimating the strain between two points:

$f'(x) = \frac{f(x + dx/2) - f(x - dx/2))}{dx}$

so naturally there will be some differences. When given a patch made of a nicely behaved analytical function, like the test I just added, they produce the same (expected) results.

As for renaming the second function, you are right. Tentatively, I have called it staggered_velocity_to_strain_rate, but feel free to rename if you think of something better.

d-chambers commented 2 months ago

I also renamed gauge_multiple to step_multiple and deprecated gauge_multiple in the first function (to not break any existing codes).

ahmadtourei commented 2 months ago

The first function is estimating the strain at a point x using the neighboring points:

Isn't this the same as the case when the gauge length is actually 2 * channel spacing? It's a bit confusing because we have the gauge_multiple=1 but technically averaging strain on 2 * channel spacing. With the current first function, there is no way of getting what the second function results with step_multiple=1.

Also, it seems order has to be an even number so it might be worth updating the docs.

d-chambers commented 2 months ago

Isn't this the same as the case when the gauge length is actually 2 channel spacing? It's a bit confusing because we have the gauge_multiple=1 but technically averaging strain on 2 channel spacing. With the current first function, there is no way of getting what the second function results with step_multiple=1.

I think what should be happening when step_multiple = 2 is this for function 1:

$f'(x) = \frac{f(x +2dx) - f(x - 2dx)}{4dx}$

and this for function 2:

$f'(x) = \frac{f(x + 3dx/2) - f(x - 3dx/2))}{3dx}$

Consider this evenly spaced array:

[a, b, c, d, e]

When step_multiple = 2, the first function calculates the strain at c via:

(e - a) / 4 dx

The second function doesn't estimate strain at c, it can only estimate strain between c and d via

(e - b) / 3 dx

Does that make sense?

Also, it seems order has to be an even number so it might be worth updating the docs.

Yes, good point.

ahmadtourei commented 2 months ago

and this for function 2:

I think this for function 2 should be (step_multiple=2): $f'(x) = \frac{f(x + dx) - f(x - dx))}{2dx}$

So, it actually estimates at c: (d - b) / (2*dx)

However, when step_multiple=1, it estimates values between a and b, b and c, etc. So I get what you mean now. For odd step_multiple values, that is the case.

I think what should be happening when step_multiple = 2 is this for function 1:

Also, to me, this looks like a step_multiple of 4 as we are averaging over 4*dx (not considering the order effect)

ahmadtourei commented 2 months ago

I think what should be happening when step_multiple = 2 is this for function 1:

So basically, function 1 with step_multiple=2 and order=2 should result same as function 2 with step_multiple=4. Right?

d-chambers commented 2 months ago

However, when step_multiple=1, it estimates values between a and b, b and c, etc. So I get what you mean now. For odd step_multiple values, that is the case.

Yes, you are right. Function 2 produced the same output as function 1 when its step_multiple is double (and function 1 uses an order of 2) after accounting for differences in edge handling:

  def test_equal_cases(self, terra15_das_patch):
      for mult in range(1, 20):
         # Get function 1s output and trim off edges
          out1 = (
              terra15_das_patch
              .velocity_to_strain_rate(step_multiple=mult)
              .select(distance=(mult, -mult), samples=True)
          )
          # function 2's output should match function 1 when its step is double 
          out2 = terra15_das_patch.staggered_velocity_to_strain_rate(step_multiple=2*mult)
          assert np.allclose(out1.data, out2.data)

Summary

For function 1, step_multiple means distance from the center point, for function 2 it means distance between start and last point used in the estimate
Function 2 calculates a staggered grid (or the strain for values between points) when the step_multiple is odd. Function 1 cant do this.
Function 1 returns a patch of exactly the same shape as the input. Function 2 is always step_multiple shorter along the distance axis.
Judging by how many posts it took us to sort this out, the behavior of these two functions would be too confusing for users. We should try to simplify this.

d-chambers commented 2 months ago

Here are my thoughts on how to move this forward. First, reflecting on what users might want and future DASCore plans:

Being able to calculate the strain between points is nice, and probably what most other DAS codes do. When step_multiple==1, this gives you the smallest possible gauge length, which could be desirable for several reasons. However, the staggered grid is probably less useful in the other cases. For example, the data wont be that different when step_multiple = 7 vs 6 or 8. It is probably better to just use a gauge length close to an even number of step multiples (in function 2's parlance) so the coords don't shift and the array shape doesn't change.
As you rightly pointed out, an adjustable step_multiple is an important feature and something we didn't have before this PR.
I would like to avoid changing the array shape when possible. Down the road, I hope to be able to make many patch functions jit'able with Jax for potential gains in performance. However, any function with dynamically-sized arrays can't be compiled.

So here is what I propose:

Clarify in function 1 that the new gauge_length is step_multiple * 2
Clarify that order must be a multiple of 2
Let's use function 2, currently named staggered_velocity_to_strain_rate only for the in-between case of step_multiple = 1 by removing the step_multiple parameter. For larger step multiples and orders we will point users to function_1.
Let's add a recipe or note that explores the effect of different step_sizes and orders a bit more. We may also need to add an event stored in velocity format to the example data repo.

What do you think?

ahmadtourei commented 2 months ago

Here are my thoughts on how to move this forward. First, reflecting on what users might want and future DASCore plans:

Being able to calculate the strain between points is nice, and probably what most other DAS codes do. When step_multiple==1, this gives you the smallest possible gauge length, which could be desirable for several reasons. However, the staggered grid is probably less useful in the other cases. For example, the data wont be that different when step_multiple = 7 vs 6 or 8. It is probably better to just use a gauge length close to an even number of step multiples (in function 2's parlance) so the coords don't shift and the array shape doesn't change.

As you rightly pointed out, an adjustable step_multiple is an important feature and something we didn't have before this PR.

I would like to avoid changing the array shape when possible. Down the road, I hope to be able to make many patch functions jit'able with Jax for potential gains in performance. However, any function with dynamically-sized arrays can't be compiled.

So here is what I propose:

Clarify in function 1 that the new gauge_length is step_multiple * 2

Clarify that order must be a multiple of 2

Let's use function 2, currently named staggered_velocity_to_strain_rate only for the in-between case of step_multiple = 1 by removing the step_multiple parameter. For larger step multiples and orders we will point users to function_1.

Let's add a recipe or note that explores the effect of different step_sizes and orders a bit more. We may also need to add an event stored in velocity format to the example data repo.

What do you think?

Thanks for the discussion, Derrick! As we discussed today, let's keep step_multiple adjustable for function 2 as the functions have different behaviors on the edges. I also agree that it'd be beneficial to add notes/recipes.

d-chambers commented 2 months ago

Thanks @ahmadtourei.

This findiff documentation page might be helpful for better understanding the edge stencil implementations.

d-chambers commented 2 months ago

Thanks @ahmadtourei for all your work on this.

DASDAE / dascore

velocity to strain rate with gauge length support #399

Description

Checklist

Codecov Report

Summary