devitocodes / devito

DSL and compiler framework for automated finite-differences and stencil computation
http://www.devitoproject.org
MIT License
538 stars 221 forks source link

Dot product tests failing after recent Devito install #1394

Closed rsarkar-github closed 3 years ago

rsarkar-github commented 3 years ago

Hi, I reinstalled devito yesterday, and noticed an alarming issue. One of the operators that I had coded up is now failing the dot product test. Can someone have a look and see what is going on? Also what is the best way to get help?

I'm a Ph.D. student who has recently started (about 6 months) using Devito, and it is really annoying when what you spend months coding up gets broken in a future Devito install. The attached zipped file (its an IPython notebook) captures the issue. Dot Product Test Failing.zip

A bit about the operators. The forward operator implements time dependent Born scattering (so the perturbation can change with time). The adjoint is the adjoint of this operation taking scattered data, and giving back time dependent perturbation.

I'd be grateful is someone can tell me what I'm doing wrong, or what has been changed in the new Devito version that may be causing this. The lack of failing of the dot product test is serious...it causes linear inversion algorithms to not converge. I originally detected this problem while investigating why the linear solvers were not converging.

navjotk commented 3 years ago

The handling of absorbing boundaries has changed significantly in the last few months - I've tripped over it myself trying to make results match with those from a few months ago. That's my guess. Can't spot something obvious in the first read of your code but I'll keep looking. As a quick sanity check, I would try bcs='damp' in the model creation line and see if that makes the test pass.

mloubout commented 3 years ago

rec_term = born_data_rec.interpolate(expr=u.forward) is interpolatng the wrong time step should be u

jkwashbourne commented 3 years ago

Hi Rahul,

John Washbourne here, I wanted to comment that with Mathias' fix to the interpolation I only get 4 digits accuracy when I tested with your notebook. Do you recall to what accuracy you passed dot product tests previously?

My experience (mostly learned watching Claerbout!) is that if you don't have 6 digits in float and 12 in double there is something wrong with the algo. Cheers.

jkwashbourne commented 3 years ago

FYI I ran with the same random seed np.random.seed(1) for the 3 cases:

u:

sum1;    1143.04418945
sum2;    1141.54553223
diff;       0.00065599

u.forward:

sum1;     990.65844727
sum2;    1141.54553223
diff;      -0.07076578

u.backward:

sum1;    1146.10925293
sum2;    1141.54553223
diff;       0.00199493
jkwashbourne commented 3 years ago

One more comment: I was playing around and replaced the scipy convolve with [1,-2,1] with using Devito to take the second time deriv, and the test looks a little better for same random seed.

sum1;    1141.75976562
sum2;    1141.53515625
diff;       0.00009837

for the forward op this just means:

    # Define the wave equation
    pde = model.m * u.dt2 - u.laplace + model.damp * u.dt - dm * u0.dt2

For the adjoint you need to take the deriv with another op.

rsarkar-github commented 3 years ago

@mloubout @jkwashbourne Thank you for your suggestions. As to John's question, I do not remember now how accurate the dot product test was exactly, but I'm pretty sure I got 1e-4 relative error before. Also it should not matter in principle if I'm calculating the second derivative of u0 using devito or scipy (it is just some field) - so I think the dot product needs to pass even if you generated u0 randomly. If it does not there is some problem in the underlying code for the operators that is getting generated.

Also can you tell me something about my adjoint op implementation: I have the line rec_term = born_data_rec.inject(field=u.backward, expr=born_data_rec * (dt ** 2) / model.m)

Should this be u.backward or u? I have checked that u.backward gives more accurate dot product test when I use it in conjunction with u in the forward op. Can you tell me why u.backward is correct in the adjoint, while u is correct in the forward?

Btw, in the current notebook 01_modelling,ipynb, the example contains the line rec_term = rec.interpolate(expr=u.forward) So you may want to fix that, if indeed what you are telling me is true here about my forward op here.

EDIT: Also I now see that in operators.py, you indeed use u in the forward operator!

jkwashbourne commented 3 years ago

forward is marching time forward, inject in u.forward. adjoint is marching time reversed, inject in u.backward.

If you are interested in the devito implementation of Chevron's nonlinear and linearized jacobian ops I recently completed some notebooks in devito/examples/seismic/self_adjoint that show these gory details with explanation.

Cheers

rsarkar-github commented 3 years ago

OK. So with the fix in my forward operator : `rec_term = born_data_rec.interpolate(expr=u)', the linear solvers are working again. However since we still don't have 1e-6 accuracy in the dot product tests (I and John are both getting 1e-3, 1e-4 using float32), I'd still like to know from your point of view what you think about it? I fail to see where else there might be a potential bug in my code, but I suspect that this is not something I can solve from my side (probably something in the devito backend?)

jkwashbourne commented 3 years ago

The third notebook (link below) tests correctness, and I have tests for linearity of the nonlinear operator wrt source, the adjoint of the nonlinear operator (time reversal), linearity of the linearization, adjointness of the linearization, agreement with analytic response in wholespace, etc.

Some of these tests printed out below for dtype=numpy.float32 for illustration although I usually run in numpy.float64, the point being that I don't think there is a "devito backend" issue as these are accurate to float epsilon. Mathias also has a variety of operators passing similar tests, but these are all collected in one place with maybe a tad more exposition.

linearity forward F (101, 81) (so=8) rms 1,2,diff; +1.5023981333e+00 +1.5023984909e+00 +8.4184154048e-07

linearity adjoint F (101, 81) (so=8) rms 1,2,diff; +1.3962288818e+03 +1.3962286377e+03 +3.1976398418e-07

adjoint F (101, 81) (so=8) sum_s, sum_r, diff; +4.5670062500e+04 +4.5670054688e+04 +8.5531965510e-08

linearity forward J (101, 81) (so=8) rms 1,2,diff; +1.4912480140e-07 +1.4912475876e-07 +6.3851547338e-07

linearity adjoint J (101, 81) (so=8) rms 1,2,diff; +9.2501220703e+01 +9.2501235962e+01 +3.1916107446e-07

adjoint J (101, 81) (so=8) sum_m, sum_d, diff; 7.2084956628e-05 +7.2084993008e-05 -2.5233958922e-07

https://github.com/devitocodes/devito/blob/master/examples/seismic/self_adjoint/sa_03_iso_correctness.ipynb

jkwashbourne commented 3 years ago

You might try reducing the size of your model perturbation to be inside the ABC, if it is not already. My self-adjoint implementation uses the same physics everywhere so I dont find that necessary -- but I remember when I was using a sponge ABC accuracy improved when the perturbation was smaller. I havent really gone over your code too carefully so I cant comment where to look yet.

rsarkar-github commented 3 years ago

@jkwashbourne I could not help but notice that in sa_01_iso_implementation1.ipynb I see the line:

rec_term = rec.interpolate(expr=u.forward)

These notebooks are really nice & I hope to find some time to go over them in detail. But do you know if these 3 notebooks have been checked for correctness with the latest Devito version? I just want to avoid looking at something that may potentially change in the near future.

Also another question about the self-adjoint implementation: I need to read the paper, but what is the basic philosophy here to avoid reflections from the boundary in the acoustic setting? How do you do that without absorbing boundaries?

jkwashbourne commented 3 years ago

Rahul, the output from the correctness tests above was run on the master branch last night. I am not sure you are aware, but the DeVito project incorporates CI/CD: continuous integration and continuous deployment.

This means that these correctness tests, as well as a very extensive suite of many more tests and all of the other jupyter notebooks are run every time a PR is pushed to the repository. It is super unlikely that a change that breaks any of the CI pipelines would be accepted. And if a change did happen it would be very high priority to fix. I feel pretty confident you can rely on the "devito backend" being righteous. I actually think it is much more likely you are getting lucky to get 4 digits.

Re The time step at which you inject and interpolate: once you start doing things that are time variant it can be pretty tricky to select the right index, which is why I tested for the three cases, Sometimes I might test like that to figure out which is the correct index to use. I expect your time step ought to be similar to what you see in the self-adjoint notebooks, but I am not familiar with your system. You will need to figure that out yourself.

Re the absorbing boundary system used in the self-adjoint notebooks: we implement a dissipation only attenuation model by a simple approximation of a Maxwell body. This allows us to use q to implement the absorbing boundary. In the interior of the model Q can be some physically meaningful value, and as you approach the edges of the model it becomes non-physical (small Q) purely to attenuate outgoing waves. One main benefit of this is the physics is unified everywhere in the model.

jkwashbourne commented 3 years ago

Im kinda procrastinating ... so ... if you shut off the damping things get a little better.

sum1; -8.53575781e+03
sum2; -8.53607715e+03
diff; -1.87054247e-05
jkwashbourne commented 3 years ago

Hmmm, I doubled the time sampling via vel.dt_scale = 0.5, and the dot product is much worse. This indicates there is something cumulatively worse with increasing time. Not sure how much more time I have to play with this, but kind of stuck looking for the issue now ...

rsarkar-github commented 3 years ago

Thanks for looking at it. I think doubling the time sampling may lead to numerical dispersion / CFL condition issues. I have seen dot product tests fail before elsewhere, because of such issues. Not sure though, I'll need to check in detail. For now I'm just relying on what I have, as all previous linear inversions (of varying complexity, where the answer is known) ran through fine.

jkwashbourne commented 3 years ago

oops, meant halved ... twice the # samples.

jkwashbourne commented 3 years ago

I still conclude that "something aint right" in your code, but dont have time to track it down rn

jkwashbourne commented 3 years ago

It just occurred to me that a good approach might be to start with just time invariant born and make sure you get float eps, then add the time varying stuff back.

rsarkar-github commented 3 years ago

i really don't know what could be the culprit here. thanks for these suggestions...i'll have a look. although i see no obvious place in my code where there could be a bug.

jkwashbourne commented 3 years ago

I think reverting to time invariant as a first step should definitely get you back on track. Once you make that float epsilon, adding back the time variance incrementally should be straightforward. Good luck.