devitocodes / devito

DSL and compiler framework for automated finite-differences and stencil computation
http://www.devitoproject.org
MIT License
562 stars 228 forks source link

devito Data object issues #1184

Open tjb900 opened 4 years ago

tjb900 commented 4 years ago

Some of the methods of devito.Data rely on attributes set in __new__ and __array_finalize__, when these are not always called.

For example, this code:

from devito import Grid, Function
import pickle

g = Grid((20, 20))
u = Function(name='u', grid=g)

d1 = u.data
d2 = pickle.loads(pickle.dumps(d1))

try:
    # Will fail in array_finalize due to lack of _index_stash
    print(d2)
except Exception:
    traceback.print_exc()

try:
    # Fails in __del__ due to lack of memfree_args
    del d2
except Exception:
    traceback.print_exc()

demonstrates the issue.

<snip>
AttributeError: 'Data' object has no attribute '_index_stash'
<snip>
AttributeError: 'Data' object has no attribute '_memfree_args'
<snip>
tjb900 commented 4 years ago

I think there might be a case to be made that not calling __array_finalize__ at unpickling time might be considered a numpy bug, since this document https://docs.scipy.org/doc/numpy/user/basics.subclassing.html seems to claim that __array_finalize__ covers all methods of instance creation:

Because __array_finalize__ is the only method that always sees new instances being created, it is the sensible place to fill in instance defaults for new object attributes, among other tasks.

rhodrin commented 4 years ago

Data objects are not designed to be 'stripped' away from their Function in this manner. Is there any reason why:

from devito import Grid, Function
import pickle
g = Grid((20, 20))
u = Function(name='u', grid=g)
u2 = pickle.loads(pickle.dumps(u))
try:
    # Will fail in array_finalize due to lack of _index_stash
    print(u2.data)
except Exception:
    traceback.print_exc()
try:
    # Fails in __del__ due to lack of memfree_args
    del u2
except Exception:
    traceback.print_exc()

would not achieve what you want?

tjb900 commented 4 years ago

Well, that might be true, but it's relatively easy to end up with standalone Data objects. e.g. this returns a Data:

def f():
    # create devito stuff
    # run operator
    output_i_want = u.data[:] * 2
    return output_i_want

And when these resulting objects are used in dask/distributed, they are pickled and then it makes for sad times.

Now, it would work if everyone knew to cast/copy everything to ndarray. But I think moving a couple of things around in data.py should make it a non-issue anyway?

rhodrin commented 4 years ago

Yeah, I see what you mean. So creating a 'new' Data object clearly works, but I'll have a think re. the best way to get 'that' attribute behaviour in data.py (probably through implementing our own __reduce__ and __setstate__).

from devito import Grid, Function
from devito.data import Data
import pickle

g = Grid((20, 20))
u = Function(name='u', grid=g)

u.data[:] = 1.0

d1 = u.data

d2 = pickle.loads(pickle.dumps(d1))

d3 = Data(d2.shape, d2.dtype.type,
          decomposition=None,
          modulo=(False,)*len(d2.shape))

d3.data = d2.data

d2._memfree_args = None
del d2

print(d3)
del d3
rhodrin commented 4 years ago

For my reference: seems like a solution could be in here https://stackoverflow.com/questions/26598109/preserve-custom-attributes-when-pickling-subclass-of-numpy-array

rhodrin commented 4 years ago

So the changes required to properly pickle our Data type seem non-trivial - for a start mpi4py.MPI.Comm objects can't be pickled.

Therefore I'd suggest casting to numpy arrays prior to pickling as you mention above.

~Will leave this open for a days or to allow anyone to post comments/suggestions before closing.~

rhodrin commented 4 years ago

After a quick discussion, we'll keep this open but it's not clear to us (aside from for user convenience) in what situations this functionality is particularly important? (Pickling Data may even have some unintended negative side-effects).

We'll revisit this if shown it's particularly needed.

tjb900 commented 4 years ago

Sorry, just getting back to this.

It was the simplest way to make a reproducer, but I'm not 100% sure that pickling is the only source of the issue. At least according to that numpy documentation quoted above (repeated here):

Because __array_finalize__ is the only method that always sees new instances being created, it is the sensible place to fill in instance defaults for new object attributes, among other tasks.

I read this to mean that you cannot rely even on __new__ being called, which means the self._memfree_args reference in __del__ causes an error. This is a simple case of setting _memfree_args to None in __array_finalize__.

On the pickling front - I see your argument. However, in that case I think we should go further to be able to give a useful error message to the end-user. i.e. overriding __reduce__ or one of its friends to raise an Exception, rather than having a path to creating an object that's in a half-initialised state?

navjotk commented 4 years ago

I'm hitting the same issue. It's really easy to trip on this when working with Dask so I would push for increasing the priority of this.

rhodrin commented 4 years ago

@tjb900

but I'm not 100% sure that pickling is the only source of the issue.

Indeed, that a good point - I'll take a careful look.

On the pickling front - I see your argument. However, in that case I think we should go further to be able to give a useful error message to the end-user. i.e. overriding reduce or one of its friends to raise an Exception, rather than having a path to creating an object that's in a half-initialised state?

Yes, if it's not currently supported we should be throwing an error via, e.g., overriding __reduce__.

@navjotk Do you have a reproducer that doesn't involve pickling?

navjotk commented 4 years ago

My scenario involves Dask which implicitly pickles all function arguments and return values so no. Maybe we should be raising an exception as a reminder to pick up on this "mistake"?

I knew Data objects wouldn't behave well with Dask but still ended up with a function like this, basically because I didn't realise my result was a Data object.

Only thing that made me realise why this happened was me posting the error message on slack and @FabioLuporini pointing me here - and then I was facepalming.