eea / odfpy

API for OpenDocument in Python
GNU General Public License v2.0
311 stars 64 forks source link

Update caches instead of clearing them when calling removeChild() #86

Closed flh closed 5 years ago

flh commented 5 years ago

This prevents cache corruptions when an element is removed and another one is immediately added to a document. In such a situation, the element cache of the document only contains the latter element. Since this cache is not empty any more, further calls to other methods (such as getElementsByType) do not trigger cache rebuilding, yielding to incorrect results.

For the moment, the following code fails:

import odf.opendocument
import odf.text
doc = odf.opendocument.OpenDocumentText()
p1 = odf.text.P(parent=doc.text, text="foo")
p2 = odf.text.P(parent=doc.text, text="bar")
p1.parentNode.removeChild(p1)
p3 = odf.text.P(parent=doc.text, text="baz")
assert(list(doc.getElementsByType(odf.text.P)) == [p2, p3])

because the call to removeChild(p1) clears doc.element_dict, then p3 creation adds p3 to doc.element_dict. When doc.element_dict is empty, the call to getElementsByType triggers a cache rebuilding. Yet, this does not happen because of p3 addition.

flh commented 5 years ago

Forgot to mention that, as a workaround with the current release, one can add a call to doc.clear_caches() just before calling doc.getElementsByType.

AakashKhatu commented 4 years ago

Forgot to mention that, as a workaround with the current release, one can add a call to doc.clear_caches() just before calling doc.getElementsByType.

thanks for this, the workaround saved me after an hour of trying to figure out what was wrong