Closed mlange05 closed 7 years ago
Sympy treats the cache as an optional enhancement for speed. Its behaviour is completely consistent whether the cache is used or not. We should be doing the same. Also, the most common case of using an operator multiple times is:
m=DenseData()
eq=[eqn_involving_m]
op=Operator(eq)
for model in batch: #batch either comes from MPI or is being read from a file
m1=DenseData()
m1.data=model
op.apply(m=m1)
There is a valid reason to use multiple data objects with the same symbol name and any name-based caching makes that impossible while providing no advantage. Sympy doesn't restrict the use of multiple symbols with the same name. Why do we think it is a good idea?
Sympy treats the cache as an optional enhancement for speed. Its behaviour is completely consistent whether the cache is used or not. We should be doing the same.
I'm not entirely sure I understand this sentiment, but consistency with SymPy is exactly the problem that brings this about. Consider the following:
In [1]: from sympy import Function
In [2]: from devito import x, y
In [3]: f0 = Function('f')(x, y)
In [4]: f1 = Function('f')(x, y)
In [5]: f0.foo
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-c896f72c0131> in <module>()
----> 1 f0.foo
AttributeError: 'f' object has no attribute 'foo'
In [6]: f1.foo
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-69dad83151cf> in <module>()
----> 1 f1.foo
AttributeError: 'f' object has no attribute 'foo'
In [7]: f0.foo = 'bar'
In [8]: f1.foo
Out[8]: 'bar'
Now, if you replace the definitions of f0
and f1
with f0 = DenseData(name='f', shape=...)
the final line reads:
In [8]: f1.foo
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-69dad83151cf> in <module>()
----> 1 f1.foo
AttributeError: 'f' object has no attribute 'foo'
The core point of this issue is that SymPy actually does cache all symbols by name, so repeated instantiation of symbols with the same name creates two variables pointing to the same object. This is also true for all of our "function" symbols (eg. creating multiple points of a stencil), except for explicit repeated instantiation through DenseData
and TimeData
.
Sympy has a class called Dummy
to explicitly prevent such caching issues. Can we consider some of our classes inheriting from this rather than Symbol
?
We're currently inheriting from Function
, rather than Symbol
. But this looks interesting: what's so special about this Dummy
that we could reuse in our own classes to achieve the same behaviour?
I think we now have an answer to this "issue". In fact, this is a non-issue, and could probably be closed
Could you please elaborate?
I thought we agreed on the fact that we simply accept that two instances such as
u1 = DenseData(name='u', shape=...)
u2 = DenseData(name='u', shape=...)
represent different symbolic objects
Is this fixed?
Yes, I think we have a decision and can close this now.
Assuming a symbolic data object object
u = DenseData(name='u', shape=...)
, auto/sympy-generated new instances ofu(x, y)
, such asu(x, y)
andu(x+h, y)
, all cache on the originalu
object. However, an explicitly created new instanceu2 = DenseData(name='u', shape=...)
does not, as exemplified by the testtest_cache_function_new
intest_symbol_caching.py
. If this causes memory leaks duringclear_cache()
is currently unclear.The larger question is, do we want explicit new instances with the same
name
attribute to cache "magically", or should the user always be in explicit control over data objects?