GrahamDumpleton / wrapt

A Python module for decorators, wrappers and monkey patching.
BSD 2-Clause "Simplified" License
2.03k stars 231 forks source link

this __dict__ descriptor does not support '_DictWrapper' objects \ since version 1.15.0 #231

Open julianstrietzel opened 1 year ago

julianstrietzel commented 1 year ago

Using tf2onnx with the following requirements I ran into a to me unexplainable error. The problem came out of no where, as the same code worked only days before.

requirements:

- opencv-python==4.7.0.68
- segmentation-models==1.0.1
- tensorflow==2.7.4
- numpy==1.24.2
- tensorflow-addons==0.15.0
- pandas==1.3.5
- Pillow==9.4.0
- scikit-image==0.18.3
- tqdm==4.62.3
- fire==0.5.0
- tf2onnx==1.12.1

Error Traceback

 File "steps/tf_onnx_conversion.py", line 39, in convert_tf_to_onnx
    convert_tf2onnx()
  File "/databricks/conda/envs/mlflow-.../lib/python3.8/site-packages/tf2onnx/convert.py", line 238, in main
    graph_def, inputs, outputs, initialized_tables, tensors_to_rename = tf_loader.from_saved_model(
  File "/databricks/conda/envs/mlflow-.../lib/python3.8/site-packages/tf2onnx/tf_loader.py", line 614, in from_saved_model
    _from_saved_model_v2(model_path, input_names, output_names,
  File "/databricks/conda/envs/mlflow-.../lib/python3.8/site-packages/tf2onnx/tf_loader.py", line 598, in _from_saved_model_v2
    frozen_graph, initialized_tables = from_trackable(imported, concrete_func, inputs, outputs, large_model)
  File "/databricks/conda/envs/mlflow-.../lib/python3.8/site-packages/tf2onnx/tf_loader.py", line 232, in from_trackable
    _get_hash_table_info_from_trackable(trackable, table_info,
  File "/databricks/conda/envs/mlflow-.../lib/python3.8/site-packages/tf2onnx/tf_loader.py", line 467, in _get_hash_table_info_from_trackable
    for t in r.__dict__.values() if hasattr(r, '__dict__') else []:
  File "/databricks/conda/envs/mlflow-.../lib/python3.8/site-packages/tensorflow/python/training/tracking/data_structures.py", line 828, in __getattribute__
    return super(_DictWrapper, self).__getattribute__(name)
TypeError: this __dict__ descriptor does not support '_DictWrapper' objects

I pinned the problem down to one unfrozen dependency in tensorflow: wrapt>=1.11.0 While the old unproblematic runs had 1.14.1 installed, the new failed runs installed 1.15.0 By freezing this to wrapt==1.14.1 I was able to solve the issue.

There are other users reporting similar problems on stackoverflow and the tf2onnx issues.

@GrahamDumpleton Maybe this is an issue with the new release?

GrahamDumpleton commented 1 year ago

All I can speculate at the moment is that it relates to the change:

When the C extension for wrapt was being used, and a property was used on an object proxy wrapping another object to intercept access to an attribute of the same name on the wrapped object, if the function implementing the property raised an exception, then the exception was ignored and not propagated back to the caller. What happened instead was that the original value of the attribute from the wrapped object was returned, thus silently suppressing that an exception had occurred in the wrapper. This behaviour was not happening when the pure Python version of wrapt was being used, with it raising the exception. The pure Python and C extension implementations thus did not behave the same.

Note that in the specific case that the exception raised is AttributeError it still wouldn’t be raised. This is the case for both Python and C extension implementations. If a wrapper for an attribute internally raises an AttributeError for some reason, the wrapper should if necessary catch the exception and deal with it, or propagate it as a different exception type if it is important that an exception still be passed back.

Basically, incorrect prior behaviour in wrapt resulted in an exception that was raised by a property wrapper during access not being raised properly in effect hiding an error in that wrapper code. This error only occurred in C extension version of wrapt so if something using wrapt had been correctly also tested against pure Python version of wrapt then it would have been detected and the code using wrapt written properly to handle the exception.

In other words, something using wrapt here should be dealing with a lower level exception from what wrapt is wrapping or from the wrapper, but it isn't, or the thing being wrapped shouldn't be raising an exception to begin with and is broken.,

There is a chance because of how Python works that fixing this is subtly changing how hasattr() worked in the failure case. This might have consequences with following code from tensorflow, which was possibly relying on prior wrong behaviour in wrapt without realising it was an issue that should have been reported, or would have been known of if had tested with pure Python version of wrapt.

  def __getattribute__(self, name):
    if (hasattr(type(self), name)
        and isinstance(getattr(type(self), name), property)):
      # Bypass ObjectProxy for properties. Whether this workaround is necessary
      # appears to depend on the Python version but not the wrapt version: 3.4
      # in particular seems to look up properties on the wrapped object instead
      # of the wrapper without this logic.
      return object.__getattribute__(self, name)
    else:
      return super().__getattribute__(name)

The change in what hasattr() is reporting could result in it going down the wrong branch of the if statement triggering code that looks like it will not work.

To try and understand this all better, need to work out what was being wrapped in the specific case and what exception was being raised by the property, why, and whether that is a bug or not.

That all said, I did know about this difference between pure Python and C extension versions for quite a while (years) and I didn't fix it for specific reasons, which when I finally did recently make change after prompting for related issue, I couldn't remember. Seeing this issue it does remind me that the change possibly wasn't made because of problems in properly dealing with hasattr() check when using C extension. I will have to do some testing to try and remember what those original reasons were that caused me to hold off addressing this difference for so long and why I kept thinking it couldn't be fixed.

In the mean time, can you rerun your code when setting the process environment variable:

WRAPT_DISABLE_EXTENSIONS=true

and see if the behaviour changes.

Note that this environment variable should be set and exported in parent process environment (shell) before running your program. It cannot be set from in Python code of the program itself, unless you can guarantee it is set before wrapt module is imported. If the wrapt module has already been imported, too late to set it.

GrahamDumpleton commented 1 year ago

Related change:

GrahamDumpleton commented 1 year ago

Having thought about it a bit, I think the reason I originally didn't fix it was because CPython lacks a C API equivalent of the hasattr() function available from Python code. For me to emulate what hasattr() did in my own C code was basically going to be impossible to do and support across multiple versions because of changes of C object model over time. What I dumbly only realised more recently was that hasattr() more or less is the same as invoking getattr() and if it raises AttributeError then the attribute doesn't exist. So the C code was fixed so that instead of ignoring all exceptions like it wrongly used it, it would raise them if not AttributeError and if was AttributeError then fall through to next phase of trying to get attribute. Thus I think the code change I made is valid and fixes the wrong behaviour, but looks like stuff may be relying on the wrong behaviour or otherwise not dealing with the exceptions which are being raised like they should have been. I still can't grok what the layers of tf2onnx and tensorflow are doing in wrapping things to understand what may need to change in them to handle the corrected behaviour.

julianstrietzel commented 1 year ago

Okay so if I understand correctly, you suspect wrong usage based on mistakes in prior wrapt versions. I feel like the tensorflow devs are aware of this problem, tho. They froze wrapt to <1.15 in the current nightlys a few days ago. There is also an issue in the tensorflow github which was marked as solved afterwards.

julianstrietzel commented 1 year ago

Oh wait setting the Environment Variable as you told actually solved the issue. Using tf 2.7.4 with wrapt 1.15.0 is now working.

GrahamDumpleton commented 1 year ago

Not good if setting the environment variable fixes the issue, was expecting it to still fail. If it also fails then does mean I have introduced a difference. Is disappointing that they just pinned the version and moved on without really seeming to be too interested in the underlying cause. I expect this is going to be hard to track down and understand. I am going to need a really simple example I can run which isn't dependent on having some extra data set to work.

GrahamDumpleton commented 1 year ago

Neither of the two code examples I can find in or linked to the tensorflow issue replicate the issue when using wrapt 1.15.0. :-(

GrahamDumpleton commented 1 year ago

And nor does the example from StackOverflow fail either.

I will have to try with older tensorflow, or maybe even snapshot if that was version issue occurred with. The current public tensorflow==2.11.0 works fine.

GrahamDumpleton commented 1 year ago

Example for replicating issue is:

import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras as ker

(training_images, training_labels), (testing_images, testing_labels) = ker.datasets.cifar10.load_data()
testing_images, testing_images = training_images / 255, testing_images / 255

class_names = ['Plane', 'Car', 'Bird', 'Cat', 'Deer', 'Dog', 'Frog', 'Horse', 'Ship', 'Truck']

for i in range(16):
    plt.subplot(4, 4, i+1)
    plt.xticks([])
    plt.yticks([])
    plt.imshow(training_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[training_labels[i][0]])

plt.show()

training_images = training_images[:20000]
training_labels = training_labels[:20000]
testing_images = testing_images[:4000]
testing_labels = testing_labels[:4000]

#Model
model = ker.models.Sequential()
model.add(ker.layers.Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)))
model.add(ker.layers.MaxPooling2D((2,2)))
model.add(ker.layers.Conv2D(64, (3,3), activation='relu'))
model.add(ker.layers.MaxPooling2D((2,2)))
model.add(ker.layers.Conv2D(64, (3,3), activation='relu'))
model.add(ker.layers.Flatten())
model.add(ker.layers.Dense(64, activation='relu'))
model.add(ker.layers.Dense(10, activation='softmax'))

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

model.fit(training_images, training_labels, epochs=1, validation_data=(testing_images, testing_labels))

loss, accuracy = model.evaluate(testing_images, testing_labels)
print(f"Loss: {loss}")
print(f"Accuracy: {accuracy}")

model.save('img_classifier.model')

and use tensorflow==2.12.0rc0.

Running as:

WRAPT_DISABLE_EXTENSIONS=true python example.py

doesn't result in a failure, indicating difference in behaviour with C extension and pure Python versions of wrapt.

GrahamDumpleton commented 1 year ago

So far it looks like the issue is that the change to correct behaviour in wrapt is exposing an object of type types.MappingProxyType to tensorflow __getattribute__() override which it has never seen before and doesn't have code to handle properly as neither:

object.__getattribute__(self, name)

or:

super().__getattribute__(name)

works for that type and so it blows up.

GrahamDumpleton commented 1 year ago

My first guess at what maybe they should be doing is:

  def __getattribute__(self, name):
    import types

    #if (hasattr(type(self), name)
    #    and (isinstance(getattr(type(self), name), property)):

    if hasattr(type(self), name):
        obj = getattr(type(self), name)

        if isinstance(obj, property):
            return object.__getattribute__(self, name)
        elif type(obj) == types.MappingProxyType:
            return getattr(self.__wrapped__, name)
        else:
            return super().__getattribute__(name)
    else:
      return super().__getattribute__(name)

I have node idea how to validate this. Making the changes means it runs, but I don't know how to check that saved data is actually what is required and can be restored.

The types.MappingProxyType comes into play I think because self.__wrapped__ is a __dict__ and the attribute it is trying to look up on that is __dict__, thus the result matches:

MappingProxyType = type(type.__dict__)

Mucking with __getattribute__ is horrible to start with because of the performance hit of it, plus it has all sorts of weird behaviour.

GrahamDumpleton commented 1 year ago

Another possibility is that for that access type it should not return anything but instead should raise an AttributeError. It is quite possible that higher layers are checking for existence and will do something else if __dict__ doesn't exist. So the following code also runs no issues.

  def __getattribute__(self, name):
    import types

    #if (hasattr(type(self), name)
    #    and (isinstance(getattr(type(self), name), property)):

    if hasattr(type(self), name):
        obj = getattr(type(self), name)

        if isinstance(obj, property):
            return object.__getattribute__(self, name)
        elif type(obj) == types.MappingProxyType and name == "__dict__":
            raise AttributeError("'mappingproxy' object has no attribute '__dict__'")
        else:
            return super().__getattribute__(name)
    else:
      return super().__getattribute__(name)

This probably makes more sense.

GrahamDumpleton commented 1 year ago

To summarise at this point as I don't think I can do anything more.

The behaviour of wrapt changed for the C extension when a bug in wrapt was fixed that caused suppression of exceptions when they should have been raised. With wrapt code now behaving correctly, that resulted in tensorflow acting a bit differently likely due to some inner works of how Python __getattribute__ works when it sees exceptions on trying to look up attributes. This caused tensorflow code which overrides __getattribute__ to see types.MappingProxyType objects when it didn't previously but it's code doesn't handle that case and started failing. The reason that the pure Python version of wrapt always worked is because the layers of calls when you have Python implementations of things does result in Python core operating a bit differently and so it uses a different code path to achieve the same end, and in that case it used a path which meant didn't see the problem.

At this point I really need someone in tensorflow community of authors to try with the code above which raises an AttributeError for the specific case of accessing __dict__ on __dict__ and see if their code all still works as expected.

GrahamDumpleton commented 1 year ago

BTW, forgot to highlight this, but the trigger for this is that the exception that tensorflow raises:

TypeError: this __dict__ descriptor does not support '_DictWrapper' objects

uses TypeError when it arguably should be raising AttributeError.

If it changed to AttributeError, this likely would all work without needing to change __getattribute__() to add a special check for __dict__ access on a mappingproxy.

rainwoodman commented 1 year ago

Ping from tensorflow. Thanks for root causing the issue!

I guess you mean TF shouldn't call __getattribute__() on ObjectProxy because it is not allowed in the C extension based wrapt object. My question is should we skip chaining up to __getattribute__ to ObjectProxy even when the object is Python based? If that's not the case, then how do I make sense of it that we shall chain up to it when wrapt uses Python, and not chaining up to it when wrapt uses C extension as the backend?

What does the getattribute method of ObjectProxy do?

rainwoodman commented 1 year ago

Also, the error message TypeError: this __dict__ descriptor does not support '_DictWrapper' objects is raised from CPython, typeobject.c.

So the best TensorFlow can do is to catch it and reraise as an Attribute error, like this:

try:
  return super().__getattribute__(name)
except TypeError as e:
  raise AttributeError from e

Or something similar along the lines.

I tried that and the test case in https://github.com/GrahamDumpleton/wrapt/issues/231#issuecomment-1456979294 finished without that error.

The pattern was used in several child classes of wrapt objects in data_structures.py in TensorFlow. If this change looks good to you. I can patch them all up. Although I am still quite puzzled (and I am in general puzzled) by the getattribute behavior in Python.

GrahamDumpleton commented 1 year ago

Sorry for the slow reply, have had a lot going on the last few weeks.

The getattribute() behaviour is definitely inscrutable at times and I can't even pretend to fully understand it. :-)

Using that try/except, raising it as AttributeError instead of TypeError seems in that context to be okay and easier than trying to have an explicit check of type(obj) == types.MappingProxyType and name == "__dict__" as I mentioned above. I guess we can go with that and hope all will be okay.

rainwoodman commented 1 year ago

Thanks. It turns out @k-w-w prepared a very simple fix from saved_model exactly aligned with your suggestion a while ago. The patch was cherrypicked to the 2.13 branch and on its way to RC1. If all goes well, tensorflow 2.13 will work with wrapt 1.15.

AlexWaygood commented 1 year ago

Hi! I got here from investigating a bug report over at CPython:

The behaviour of raising TypeError when you try to access __dict__ on wrapt objects seems to be breaking an assumption in the logic of inspect.gettattr_static in the standard library. This in turn breaks runtime-checkable protocols that use typing_extensions (and will break runtime-checkable protocols that use typing on Python 3.12+), as they use getattr_static in isinstance() checks:

Python 3.11.2 (tags/v3.11.2:878ead1, Feb  7 2023, 16:38:35) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import inspect, wrapt
>>> import typing_extensions as te
>>> class Foo: pass
...
>>> class Bar(Foo, wrapt.ObjectProxy): pass
...
>>> Bar({}).__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: this __dict__ descriptor does not support 'Bar' objects
>>> inspect.getattr_static(Bar({}), 'baz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\alexw\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 1825, in getattr_static
    instance_result = _check_instance(obj, attr)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 1772, in _check_instance
    instance_dict = object.__getattribute__(obj, "__dict__")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: this __dict__ descriptor does not support 'Bar' objects
>>> @te.runtime_checkable
... class Proto(te.Protocol):
...     def method(self) -> None: pass
...
>>> isinstance(Bar({}), Proto)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\alexw\coding\getattr-static-repro\.venv\Lib\site-packages\typing_extensions.py", line 604, in __instancecheck__
    val = inspect.getattr_static(instance, attr)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 1825, in getattr_static
    instance_result = _check_instance(obj, attr)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\alexw\AppData\Local\Programs\Python\Python311\Lib\inspect.py", line 1772, in _check_instance
    instance_dict = object.__getattribute__(obj, "__dict__")
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: this __dict__ descriptor does not support 'Bar' objects

I'm not yet sure whether the possibility that accessing __dict__ could raise TypeError is something inspect.getattr_static should account for or not. I've never seen it happen anywhere else.

GrahamDumpleton commented 1 year ago

I will do some digging, but on first impression I suspect to resolve resolution order when have multiple inheritance like that with Bar, that Bar should define a property for __dict__ which returns the correct one from the set of base classes. Python seems to do some strange pre check based on looking at inheritance graph and generates TypeError: this __dict__ descriptor does not support 'Bar' objects. If you have a __dict__ property which returns desired one from base class, then all is good. This I guess may depend on what Bar does. Resolution for __dict__ when have multiple inheritance is non obvious. :-(

GrahamDumpleton commented 1 year ago

As example, if __dict__ should resolve to that for wrapped object, would have:

import inspect, wrapt

class Foo: pass

class Bar(Foo, wrapt.ObjectProxy):
  @property
  def __dict__(self):
    return self.__wrapped__.__dict__

For that get:

>>> Bar({}).__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute '__dict__'. Did you mean: '__dir__'?
>>> inspect.getattr_static(Bar({}), 'baz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/inspect.py", line 1851, in getattr_static
    raise AttributeError(attr)
AttributeError: baz

The first error mirrors what would occur if attempted to access __dict__ on {}, ie., what was wrapped in this case.

>>> {}.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute '__dict__'. Did you mean: '__dir__'?

And the second error similarly.

>>> inspect.getattr_static({}, 'baz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python@3.11/3.11.2_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/inspect.py", line 1851, in getattr_static
    raise AttributeError(attr)
AttributeError: baz

I don't have typing_extensions installed to test that case, but if can get past the first errors, maybe that might work now.

The big question thus is in the real use case what does Foo do and what is the purpose of Bar using multiple inheritance off it and a wrapped object? What __dict__ should be used?

GrahamDumpleton commented 1 year ago

If have issues with type check, then you may also need to counter that ObjectProxy contains:

    @property
    def __class__(self):
        return self.__wrapped__.__class__

    @__class__.setter
    def __class__(self, value):
        self.__wrapped__.__class__ = value

and provide a property to override those in Bar which returns what you really want.