devitocodes / devito

DSL and compiler framework for automated finite-differences and stencil computation
http://www.devitoproject.org
MIT License
555 stars 225 forks source link

New `SymbolicData` instances with the same name don't cache #225

Closed mlange05 closed 7 years ago

mlange05 commented 7 years ago

Assuming a symbolic data object object u = DenseData(name='u', shape=...), auto/sympy-generated new instances of u(x, y), such as u(x, y) and u(x+h, y), all cache on the original u object. However, an explicitly created new instance u2 = DenseData(name='u', shape=...) does not, as exemplified by the test test_cache_function_new in test_symbol_caching.py. If this causes memory leaks during clear_cache() is currently unclear.

The larger question is, do we want explicit new instances with the same name attribute to cache "magically", or should the user always be in explicit control over data objects?

navjotk commented 7 years ago

Sympy treats the cache as an optional enhancement for speed. Its behaviour is completely consistent whether the cache is used or not. We should be doing the same. Also, the most common case of using an operator multiple times is:

m=DenseData()
eq=[eqn_involving_m]
op=Operator(eq)
for model in batch: #batch either comes from MPI or is being read from a file
    m1=DenseData()
    m1.data=model
    op.apply(m=m1)

There is a valid reason to use multiple data objects with the same symbol name and any name-based caching makes that impossible while providing no advantage. Sympy doesn't restrict the use of multiple symbols with the same name. Why do we think it is a good idea?

mlange05 commented 7 years ago

Sympy treats the cache as an optional enhancement for speed. Its behaviour is completely consistent whether the cache is used or not. We should be doing the same.

I'm not entirely sure I understand this sentiment, but consistency with SymPy is exactly the problem that brings this about. Consider the following:

In [1]: from sympy import Function

In [2]: from devito import x, y

In [3]: f0 = Function('f')(x, y)

In [4]: f1 = Function('f')(x, y)

In [5]: f0.foo
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-c896f72c0131> in <module>()
----> 1 f0.foo

AttributeError: 'f' object has no attribute 'foo'

In [6]: f1.foo
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-69dad83151cf> in <module>()
----> 1 f1.foo

AttributeError: 'f' object has no attribute 'foo'

In [7]: f0.foo = 'bar'

In [8]: f1.foo
Out[8]: 'bar'

Now, if you replace the definitions of f0 and f1 with f0 = DenseData(name='f', shape=...) the final line reads:

In [8]: f1.foo
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-69dad83151cf> in <module>()
----> 1 f1.foo

AttributeError: 'f' object has no attribute 'foo'

The core point of this issue is that SymPy actually does cache all symbols by name, so repeated instantiation of symbols with the same name creates two variables pointing to the same object. This is also true for all of our "function" symbols (eg. creating multiple points of a stencil), except for explicit repeated instantiation through DenseData and TimeData.

navjotk commented 7 years ago

Sympy has a class called Dummy to explicitly prevent such caching issues. Can we consider some of our classes inheriting from this rather than Symbol?

FabioLuporini commented 7 years ago

We're currently inheriting from Function, rather than Symbol. But this looks interesting: what's so special about this Dummy that we could reuse in our own classes to achieve the same behaviour?

FabioLuporini commented 7 years ago

I think we now have an answer to this "issue". In fact, this is a non-issue, and could probably be closed

navjotk commented 7 years ago

Could you please elaborate?

FabioLuporini commented 7 years ago

I thought we agreed on the fact that we simply accept that two instances such as u1 = DenseData(name='u', shape=...) u2 = DenseData(name='u', shape=...) represent different symbolic objects

mloubout commented 7 years ago

Is this fixed?

mlange05 commented 7 years ago

Yes, I think we have a decision and can close this now.