chrysn / aiocoap

The Python CoAP library
Other
267 stars 120 forks source link

OSCORE Plugtest 5 misbehaves on GitLab #111

Closed chrysn closed 6 years ago

chrysn commented 6 years ago

For reasons I could not find, OSCORE plug test 5 times out on GitLab runners on Python 3.6 only. It passes on the runners on 3.5, passes on 3.6 locally, and passes when I run it in a local docker according to the .gitlab-ci.yml instructions.

It appers that the test passes but the process just does not terminate.

I've added to the debug output still could not drill down to the cause of the problem; any assistance is appreciated.

chrysn commented 6 years ago

I've found a test system where I can reproduce the issue; it's a long-not-updated Debian sid with python3 3.6.4-1. The issue went away after upgrading to 3.6.5-3. Digging down by going back to the affected version…

chrysn commented 6 years ago

Gathering observations here as I'm debugging: The test passed once after downgrading back, but since I cleaned the aiocoap git repository (removing .eggs et al), the test fails reproducibly again.

It's weird, adding some debug prints makes the issue go away. I've changed the capture mechanism (which was a crude workaround for .communicate() not being resumable after cancellation) during debugging, and finally found the culprit somewhere else:

Those particular Python versions' GC reaps the OSCORE request tasks before they've completed, so the unprotected response message never reaches the application. A workaround which keeps the tasks around explicitly is being documented and tested.