Incorrect values are computed for `time_m` and `time_M` arguments

speglich commented 3 years ago

Hello folks,

As noticed by @nogueirapeterson, we are experiencing numeric precision errors with np.float64 in adjoint tests, he introduced this errors on Slack (here) and now we figured out what’s going on.

The error isn’t connected with the precision used or with the equations implemented (e.g. acoustic self-adjoint), but with the time arguments (time_m and time_M) that is computed for forward and backward operators. The difference between time_m and time_M are the cause of this precision error, that only could be noticed when we used np.float64.

The adjoint test is attached on this issue, however we build a MFE to explain what we found.

from devito import Grid, TimeFunction, Eq, Operator

grid = Grid(shape=(10,10))
p = TimeFunction(name="p", grid=grid,time_order=1, space_order=4)

stencil_forward = Eq(p.forward, p + 1.)
op1 = Operator(stencil_forward)
op1.apply(time=5)

print(op1.arguments(time=5))

stencil_backward = Eq(p.backward, p + 1.)
op2 = Operator(stencil_backward)
op2.apply(time=5)

print(op2.arguments(time=5))

In this MFE we could notice that time_m parameter is different for both cases, causing different time steps for each operator.

The time_m and time_M values are computed by _prepare_arguments (if they are not passed by user) and uses dspace to avoid out of boundary acesses. As we know and expect the dspace computed in each operator are different, and are perfectly correct, but maybe the way that we construct the time range interval could be improved, once the out of boundary access don’t happen on this time dimensions.

adjoint_test.txt

mloubout commented 3 years ago

So thinking about it, these arguments make sense for that MFE, in the reverse mode, the last computed element is p[time_m -1] so it has to be time_m=1 to be valid.

In the forward mode, the first computed is p[1] and last is p[6] In the reverse mode, the first computed is p[4] and last is p[0]

So I think with time=6 in reverse mode should be adjoints, maybe would be good to add a source to make sure the automatic bounds are correct.

nogueirapeterson commented 3 years ago

Fixed... We think that is not only a question of increase the range on reverse mode. Increasing the range doesn't allow us to guaranty that the memory accessed will be the same.
This example show us what we trying to explain when we are using the reverse mode:

for (int time = time_M, t2 = (time + 1)%(2), t0 = (time)%(2); time >= time_m; time -= 1, t2 = (time + 1)%(2), t0 = (time)%(2))

Considering that the time range initially is from 1 to 6 and I increase the range to 7, instead to change the range from 0 to 6, t0 will be 1 when time=7 instead of 0 when time=0, resulting in differents computed values.

mloubout commented 3 years ago

for (int time = time_M, t2 = (time + 1)%(2), t0 = (time)%(2); time >= time_m; time -= 1, t2 = (time + 1)%(2), t0 = (time)%(2))

this is the loop for the reverse mode the one you are considering is for the forward.

nogueirapeterson commented 3 years ago

But, anyway, is this analysis valid? Make sense?

Leitevmd commented 3 years ago

This is what happens on forward iterations. Last p calculated is p[0].

Forward	m					M
time	0	1	2	3	4	5
t0	0	1	0	1	0	1
t1	1	0	1	0	1	0

Now moving to backward marching. If time range is 5 to 0, the first p read is p[1] which is wrong. Anyway, I don't think the last write would be invalid, since we actually write in p[(time+1)%2], which is 0 in case of time_m=1 and 1 in case of time_m=0.

Backward	M					m
time	5	4	3	2	1	0
t0	1	0	1	0	1	0
t2	0	1	0	1	0	1

But if we march from 6 to 1, as @mloubout says, then we get p[0] as our first read, which is the last p we write on forward marching, which I think is the right thing to do.

Backward	M					m
time	6	5	4	3	2	1
t0	0	1	0	1	0	1
t2	1	0	1	0	1	0

The problem with this approach is that it's working with this MFE, but another examples are raising an exception:

  File "/home/victor.leite/lde/devito-issues/devito/types/dimension.py", line 290, in _arg_check
    raise InvalidArgument("OOB detected due to %s=%d" % (self.max_name,
devito.exceptions.InvalidArgument: OOB detected due to time_M=246

So I'm checking that also..

Leitevmd commented 3 years ago

Ok, so the problem is when we use other functions that are accessed with the original time index (e.g. source or receiver functions). On these arrays, if we use time_M >= time_size we try to access out of bound indices.

Leitevmd commented 3 years ago

So, if we need the last time step from a function previously calculated with .forward, and also we use a fixed time size function at the second operator, we need one more adjustment:

import numpy, random
from devito import Grid, TimeFunction, Eq, Operator, Buffer

grid = Grid(shape=(2,))
p = TimeFunction(name='p', grid=grid, space_order=0, dtype=numpy.float64, save=Buffer(2))
s = TimeFunction(name='s', grid=grid, space_order=0, dtype=numpy.float64, save=11)

s.data[:] = [ [random.random()] for n in range(11) ]

# FORWARD

op1 = Operator(Eq(p.forward, p + s))
op1.apply(time=10)

# BACKWARD

# 1) Numerical error
# - Don't get the last p and s
# - Execute less time steps (time_m=1)
# op2 = Operator(Eq(p.backward, p - s))
# op2.apply(time=10)

# 2) Exception
# - s does not have s[11] position
# op2 = Operator(Eq(p.backward, p - s))
# op2.apply(time=11)

# 3) OK
# Read s with right index
op2 = Operator(Eq(p.backward, p - s.backward))
op2.apply(time=11)

# 4) OK
# Avoid time_m=1 and read p with right index
# op2 = Operator(Eq(p, p.forward - s))
# op2.apply(time=10)

assert (numpy.all( abs(p.data[0]) < 1e-15))

Leitevmd commented 3 years ago

Assuming the user needs to be aware of time buffer behavior (especially about that implicit time_m shift) I think we could close this issue, since it isn't a bug but an expected behavior, right @FabioLuporini @mloubout ?

Anyway I guess that behavior could be turned into something more user friendly.

devitocodes / devito

Incorrect values are computed for `time_m` and `time_M` arguments #1513