Shoobx / mypy-zope

Plugin for mypy to support zope.interface
MIT License
38 stars 11 forks source link

Do not add the same interface multiple times to the MRO, round 2 #80

Closed DMRobertson closed 1 year ago

DMRobertson commented 1 year ago

For your consideration: a patch which I believe fixes #76. This tackles the same kind of problem that #41 addresses. Seems to be a regression introduced in #75 as part of supporting mypy 0.970.

This turned out to be a bit of a rabbit hole. I've tried to thoroughly reproduce my working here because I really am not confident in my understanding of mypy and mypy-zope. Apologies for the verbosity! (I was very keen to nail this one down.)

Reproduction

I reproduced the problem as reported in #76:

# Setup: in a blank venv
pip install -e path/to/mypy/zope mypy==0.981 twisted[tls]==22.4.0

cat >good.py <<EOF
from twisted.web.client import HTTPClientFactory
from twisted.web.client import HTTPPageGetter

class Y(HTTPPageGetter):
    pass
EOF

cat >bad.py <<EOF
from twisted.web.client import HTTPClientFactory
from twisted.web.client import HTTPPageGetter

class Y(HTTPPageGetter, object):
    pass
EOF

# Test:
rm -r .mypy_cache/3.10/twisted/

mypy good.py
# Success: no issues found in 1 source file

mypy bad.py
# bad.py:4: error: Cannot determine consistent method resolution order (MRO) for "Y"
# Found 1 error in 1 file (checked 1 source file)

Investigation

To attempt to understand what was going on, I added a bunch of debug print statements to mypy and mypy-zope until I reached enlightenment. I don't fully grok mypy's machinery, but from what I can tell:

When I run mypy bad.py I see that linear_hierarchy is called for HTTPPageGetter with the following TypeInfo:

TypeInfo(
  Name(twisted.web.client.HTTPPageGetter)
  Bases(twisted.web.http.HTTPClient)
  Mro(twisted.web.client.HTTPPageGetter, twisted.web.http.HTTPClient, twisted.protocols.basic.LineReceiver, twisted.internet.protocol.Protocol, twisted.internet.protocol.BaseProtocol, twisted.protocols.basic._PauseableMixin, builtins.object, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext)
  Names(
    _completelyDone (builtins.bool)
    _specialHeaders (builtins.set[builtins.bytes])
    connectionLost
    connectionMade
    failed (builtins.int)
    followRedirect (builtins.bool)
    handleEndHeaders
    handleHeader
    handleResponse
    handleStatus
    handleStatusDefault
    handleStatus_200
    handleStatus_201 (def (self: Any) -> Any)
    handleStatus_202 (def (self: Any) -> Any)
    handleStatus_301
    handleStatus_302
    handleStatus_303
    headers (Any)
    message (Any)
    quietLoss (builtins.int)
    status (Any)
    timeout
    version (Any)))

In particular the MRO contains two copies of IProtocol and ILoggingContext, which doesn't seem right (c.f. #41). As I understand it, mypy bad.py is loading the TypeInfo for HttpPageGetter that we persisted at the end of mypy good.py. So what was going on when we ran mypy the first time?

I will try to summarise a shedload of debug information. At some point mypy computes the MRO for twisted.internet.protocol.Protocol. The argument to linearize_hierarchy is

TypeInfo(
    Name(twisted.internet.protocol.Protocol)
    Bases(twisted.internet.protocol.BaseProtocol)
    Mro()
    Names())

Later, mypy computes the MRO for a subclass twisted.protocols.policies.ProtocolWrapper. Linearize hierarchy receives

TypeInfo(
    Name(twisted.protocols.policies.ProtocolWrapper)
    Bases(twisted.internet.protocol.Protocol)
    Mro()
    Names())

and recursively calls itself with the twisted.internet.protocol.Protocol TypeInfo. This time the MRO and Names have already been computed!


TypeInfo(
    Name(twisted.internet.protocol.Protocol)
    Bases(twisted.internet.protocol.BaseProtocol)
    Mro(twisted.internet.protocol.Protocol, twisted.internet.protocol.BaseProtocol, builtins.object, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext)
    Names(
      connectionLost
      dataReceived
      factory (Union[twisted.internet.protocol.Factory, None])
      logPrefix))

The MRO looks sensible there.

By the time mypy comes to compute the MRO for twisted.spread.banana.Banana (yeah, no idea there) we again lookup the MRO for twisted.internet.protocol.Protocol. But the MRO now has duplicate interfaces.

TypeInfo(
    Name(twisted.spread.banana.Banana)
    Bases(twisted.internet.protocol.Protocol, twisted.persisted.styles.Ephemeral)
    Mro()
    Names())
TypeInfo(
   Name(twisted.internet.protocol.Protocol)
    Bases(twisted.internet.protocol.BaseProtocol)
    Mro(twisted.internet.protocol.Protocol, twisted.internet.protocol.BaseProtocol, builtins.object, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext)

Explanation

The following dirty patch adds useful logging.

Dirty patch ```diff diff --git a/src/mypy_zope/plugin.py b/src/mypy_zope/plugin.py index 312c0b5..17fa127 100644 --- a/src/mypy_zope/plugin.py +++ b/src/mypy_zope/plugin.py @@ -364,7 +364,7 @@ def analyze_subinterface(classdef_ctx: ClassDefContext) -> None: def get_customize_class_mro_hook( self, fullname: str ) -> Optional[Callable[[ClassDefContext], None]]: - # print(f"get_customize_class_mro_hook: {fullname}") + print(f"get_customize_class_mro_hook: {fullname}") def analyze_interface_base(classdef_ctx: ClassDefContext) -> None: # Create fake constructor to mimic adaptation signature @@ -406,6 +406,9 @@ def analyze(classdef_ctx: ClassDefContext) -> None: if directiface or subinterface: self._analyze_zope_interface(classdef_ctx.api, classdef_ctx.cls) + self.log(f"get_customize_class_mro_hook ending for {fullname}") + self.log(info.mro) + if fullname == "zope.interface.interface.Interface": return analyze_interface_base @@ -695,10 +698,19 @@ def _apply_interface(self, impl: TypeInfo, iface: TypeInfo) -> None: # there is a decorator for the class that will create a "type promotion", # but ensure this only gets applied a single time per interface. promote = Instance(iface, []) + if impl.fullname == "twisted.internet.protocol.Protocol": + for ti in impl.mro: + self.log(f"DMR HMM: {ti._promote}, {promote}, {ti.fullname}") + self.log(f"DMR HMM: {type(ti._promote)}, {type(promote)}, {ti.fullname}") if not any(ti._promote == promote for ti in impl.mro): faketi = TypeInfo(SymbolTable(), iface.defn, iface.module_name) faketi._promote = [promote] + if impl.fullname=="twisted.internet.protocol.Protocol": + self.log(f"============== APPENDING {iface.defn.fullname} =====") impl.mro.append(faketi) + self.log(f"mro of {impl.fullname}@{hex(id(impl))} ({hex(id(impl.mro))}) is now:") + self.log(impl.mro) + def plugin(version: str) -> PyType[Plugin]: ```

We can then see where mypy-zope is adding the duplicate interfaces.

  ZOPE: get_customize_class_mro_hook ending for twisted.internet.protocol.BaseProtocol
  ZOPE: [<TypeInfo twisted.internet.protocol.BaseProtocol>, <TypeInfo builtins.object>]
  ZOPE: get_customize_class_mro_hook ending for twisted.internet.protocol.Protocol
  ZOPE: [<TypeInfo twisted.internet.protocol.Protocol>, <TypeInfo twisted.internet.protocol.BaseProtocol>, <TypeInfo builtins.object>, <TypeInfo twisted.internet.interfaces.IProtocol>, <TypeInfo twisted.internet.interfaces.ILoggingContext>]
  ZOPE: Found implementation of twisted.internet.interfaces.IProtocol: twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.IProtocol, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.IProtocol, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.IProtocol, builtins.object
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, builtins.object
* ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: [twisted.internet.interfaces.ILoggingContext], twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.ILoggingContext
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.ILoggingContext
  ZOPE: ============== APPENDING twisted.internet.interfaces.IProtocol =====
  ZOPE: mro of twisted.internet.protocol.Protocol@0x7f7a62f0a6c0 (0x7f7a66731640) is now:
  ZOPE: [<TypeInfo twisted.internet.protocol.Protocol>, <TypeInfo twisted.internet.protocol.BaseProtocol>, <TypeInfo builtins.object>, <TypeInfo twisted.internet.interfaces.IProtocol>, <TypeInfo twisted.internet.interfaces.ILoggingContext>, <TypeInfo twisted.internet.interfaces.IProt  ocol>]
  ZOPE: Found implementation of twisted.internet.interfaces.ILoggingContext: twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.ILoggingContext, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.Protocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.ILoggingContext, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.protocol.BaseProtocol
* ZOPE: DMR HMM: [], twisted.internet.interfaces.ILoggingContext, builtins.object
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, builtins.object
* ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: [twisted.internet.interfaces.ILoggingContext], twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.ILoggingContext
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.ILoggingContext
* ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.ILoggingContext, twisted.internet.interfaces.IProtocol
* ZOPE: DMR HMM: <class 'list'>, <class 'mypy.types.Instance'>, twisted.internet.interfaces.IProtocol
  ZOPE: ============== APPENDING twisted.internet.interfaces.ILoggingContext =====

In particular, notice the line:

 ZOPE: DMR HMM: [twisted.internet.interfaces.IProtocol], twisted.internet.interfaces.IProtocol, twisted.internet.interfaces.IProtocol

(which comes from self.log(f"DMR HMM: {ti._promote}, {promote}, {ti.fullname}") in the patch). The log lines shows we're comparing ti._promote = [Instance(IProtocol)] with promote Instance(IProtocol). These two things are not equal[^1] therefore the any(...) expression here

https://github.com/Shoobx/mypy-zope/blob/5139181a07febfde688af3035d00319942df4dcf/src/mypy_zope/plugin.py#L698

is always False.[^2] So we add a duplicate IProtocol to the end of the MRO.

A similar pattern proceeds for ILoggingContext.

Fix

Check if promote in ti._promote for any ti in impl.mro, now that ti._promote is a list.

I didn't notice a huge performance cost (rm -r .mypy_cache/3.10/twisted/; mypy good.py; mypy bad.py takes 2 seconds on my machine with the patch). I haven' tried this on a bigger project like matrix-org/synapse yet. (I'm happy to, but I doubt that anyone is implementing that many interfaces for this to be a problem.)

Regression

I think this was introduced in #75. Presumably mypy made _promote a list of types rather than just a single type?

[^1]: I even checked that Instance doesn't define a weird __req__. [^2]: I'm slightly surprised that mypy --strict-equality doesn't catch this!

Tests

Works For Me :tm: after applying the change:

 $ rm -r .mypy_cache/3.10/twisted/; mypy good.py; mypy bad.py
Success: no issues found in 1 source file
Success: no issues found in 1 source file

make tests also passes on my machine.

I wasn't sure how to add a test for this (apologies). I was uncertain about the following:

DMRobertson commented 1 year ago

cc @clokep for interest!

kedder commented 1 year ago

@DMRobertson thanks for digging into it. That's a research well done! Given the complexity of the issue, I'd really like to have a regression test for it. I realize it is not easy though. Obviously we don't want to add twisted as a dependency for this.

Something I don't fully understand is why do we need to run mypy on two files to repro this. Is it somehow related to mypy cache?

DMRobertson commented 1 year ago

Something I don't fully understand is why do we need to run mypy on two files to repro this. Is it somehow related to mypy cache?

Good question. I'm not completely sure what's going on: my current hypothesis is that we:

That is: I think the cache exposes the bad MRO but doesn't cause it.

I'll see if I can confirm or deny that hypothesis.

(If it's possible to reproduce this with just one file then that'd make it much easier to test!)

kedder commented 1 year ago

Of course it wouldn't be that difficult to write a test that runs mypy two times on two files (just outside of that "check all files in that directory" mini-framework. The tricky thing here (I think) is to find that minimal test case.

clokep commented 1 year ago

Great work! 👍 I'm not sure why running it on two files is the issue, when I was previously testing #41 I could run it on the same file twice in a row to reproduce...

DMRobertson commented 1 year ago

I'm not sure why running it on two files is the issue, when I was previously testing #41 I could run it on the same file twice in a row to reproduce...

I was building off the report in #76 here. But it seems like just bad.py is enough to illustrate the problem from a clean cache:

Edit: whoops, I didn't correctly clear my cache or was somehow otherwise confused. I'll investigate further.

DMRobertson commented 1 year ago

Good news: I have a minimal example:

from zope.interface import implementer, Interface

class IProtocol(Interface):
    pass

class Factory:
    # It seems important for "Protocol" show show up as an attribute annotation to
    # trigger the bug(!?)
    protocol: "Protocol"

@implementer(IProtocol)
class Protocol:
    pass

To detect the bug, one can call result = mypy.build.build(...) with a single BuildSource representing this file. By inspecting result we can get our hands on a mypy TypeInfo object, which will let us inspect the MRO that mypy has created. This means we don't have to worry about caches, re-running mypy or investigating multiple files.

I hope to get a test with this together soon.

DMRobertson commented 1 year ago

I hope to get a test with this together soon.

See f341c60. It's a bit rough-and-ready, but hopefully it's enough to serve as a proof of concept.

I'm not sure if this ought to be part of test_samples.py rather than in its own test file? There's a fair bit of duplication here. I'm happy to clean things up as you see fit. Let me know what you think!

kedder commented 1 year ago

@DMRobertson looks great, thanks! A little duplication in a test is not a big deal, much more important that we have test and won't lose the fix. Will release a new version soon.

DMRobertson commented 1 year ago

Fantastic---many thanks!