epics-base / p4p

Python bindings for the PVAccess network client and server.
BSD 3-Clause "New" or "Revised" License
27 stars 38 forks source link

Context Cleanup Errors - Python 2.7 #55

Closed hhslepicka closed 4 years ago

hhslepicka commented 4 years ago

Folks at SLAC reported to me that when exiting the interpreter they are getting the following error message:

Exception TypeError: 'must be type, not None' in <bound method Context.__del__ of <p4p.client.thread.Context object at 0x7ff6affa9bd0>> ignored

I tested it with Python 3.7 and 2.7 but I could only get this when using the later. P4P version is the latest tagged one.

Have you seen this kind of error before with p4p?

mdavidsaver commented 4 years ago

This is a new one to me. Given the lack of a traceback, I guess this must be coming from my extension code somewhere?

I'm not sure how to read this. Is this an error in the body of __del__ or an error in calling __del__?

hhslepicka commented 4 years ago

It seems to be an error while running __del__. I am trying to add some print statements into p4p to check what is going on since it happens at exit of my ipython session:

Context in the case below is p4p.client.thread.Context.

In [2]:
Do you really want to exit ([y]/n)?
Invoked close at Context
Called close at raw Context
disconnect at Context
__del__ called at raw Context
Invoked close at Context
Exception TypeError: 'super() argument 1 must be type, not None' in <bound method Context.__del__ of <p4p.client.thread.Context object at 0x7fb7505b3ed0>> ignored
hhslepicka commented 4 years ago

It is an error while __del__ is calling the Context.close() and it is already gone. I am trying to find a possible fix for it.

mdavidsaver commented 4 years ago

"super", I guess that would be line 218. I don't see how argument 1 (Context) could be None though. Some GC cycle breaking perhaps?

https://github.com/mdavidsaver/p4p/blob/c149b3b8d8a6182e24f214794dbb0befb0d42006/src/p4p/client/thread.py#L207-L218

hhslepicka commented 4 years ago

Yes! Total GC issue. Here is what I added for debug at the close method:

Closing all Client contexts - _cleanup_contexts
Invoking close for  <p4p.client.thread.Context object at 0x7fd1b859cf90>
***** DEBUG:  <class 'p4p.client.thread.Context'>
***** DEBUG:  <p4p.client.thread.Context object at 0x7fd1b859cf90>
Called close at raw Context
disconnect at Context
__del__ called at raw Context
***** DEBUG:  None
***** DEBUG:  <p4p.client.thread.Context object at 0x7fd1b859cf90>

So Context class is now None. I wrapped the call to super with a try/except TypeError and it solved the issue. I could also check for if Container at the top of close() if you prefer.

~Let me know and I can put up a PR.~ I decided to put the PR checking if not Container.

mdavidsaver commented 4 years ago

If reasonable, can you attach an example script which triggered this error/warning? I'm wondering if there was a global variable involved?

FYI. From poking around with gdb and reading gcmodule.c I suspect this is not the only way that GC might break a loop containing both a p4p.client.Context and the p4p.client.thread module. It may just be chance that py3 breaks such a loop differently.

A better solution (for py>=3.4 anyway) might be to close() from tp_finalize instead of, or in addition to, tp_dealloc. tp_finalize is called before a cycle is broken.

I'm not going to do this just now as the in progress switch to cython would reintroduce the problem since cython does not (yet) hook into tp_finalize (finalizer code is called from tp_dealloc).

mdavidsaver commented 4 years ago

A better solution (for py>=3.4 anyway)

Oh, never mind. One of the py3 changes is that __del__() becomes a finalizer. Which explains why this issue is only seen with py2.

hhslepicka commented 4 years ago

Yes! That seems to be the main cause of the issue. The script to test it is a bit weird... and I could not easily reproduce it without the instructions below:

With 2.x you will see the problem, with 3.x it won't happen.

Note: I even tried to make a tiny bit package with the two modules importing and creating the Context but no deal. Maybe I was missing something.

mdavidsaver commented 4 years ago

I guess I shouldn't be surprised that this is hard to reproduce. ipython tends to keep any value which it sees in an interpreter history buffer for some arbitrary time.

For that matter, would it be better to move the call to self.close() into the conditional? If your users haven't been seeing the warning, then the Context has already been closed, and the failing self.close() call would be a no-op anyway.

https://github.com/mdavidsaver/p4p/blob/f65fbfe9699e3b779cc72f47645198eeff93e413/src/p4p/client/raw.py#L204-L207

hhslepicka commented 4 years ago

It was also observed outside of ipython in the wild, using regular Python at an application. I don't recall seeing the message during my tests so probably moving self.close() inside the conditional will be a better idea than my poor check there and also address that at the proper level for all clients.

mdavidsaver commented 4 years ago

moving self.close() inside the conditional will be a better idea

Done with c4ddf5caebbd12aeba13ea4e61704320904c928b

hhslepicka commented 4 years ago

Thank you! 👍

mdavidsaver commented 4 years ago

Released with 3.5.2