Open SomberNight opened 3 years ago
Ok, I'll look into it. Thanks for opening the issue.
Pretty sure there is some strangeness in one of the python frameworks that you use. The reason why I say this is: I have seen other unrelated projects get bitten by this with my google searches when investigating this.
Here is 1 workaround (linked in your chat log):
And here is a known issue in Python that may be related that has gone unfixed: https://bugs.python.org/issue39758
But anyway thanks for the report.
Ok, so I am using wireshark to do a packet capture. Here is a screenshot.
My localhost Electrum is 192.168.0.101
. The remote server is 192.168.0.15
(running Fulcrum on mainnet). I initiated the server switch at Wireshark time ~323.85. I believe Python initiated the close then by sending some TLS-level data and waiting.
However, Python did not send a TCP FIN until ~20 seconds later at time ~353, at which point the Fulcrum side promptly replied with a TCP FIN immediately. (The 20 seconds I presume is the internal timeout being used in Python or in Electrum).
I do believe this issue exists possibly on the SSL layer -- it appears the Python side is "waiting for something" which never arrives. You can't see it here but immediately above when I initiated the close there was some TLS packets exchanged initially and sent from the Python side (it's immediately above the top-most thing visible).
I will investigate further if there is anything I can do on my end but this looks a lot like the Python-asyncio-specific "quirks" linked-to in the issues above... perhaps.
I believe this is what's happening
I believe Python is following the "old spec" which they changed in TLS 1.1. Now it's not 100% required to wait for the close-notify reply on the TLS layer. But Python side is waiting. And I guess many TLS stacks do behave in the way that Python expects (including the rust-based stack used in Electrs).
The Qt-based TLS stack used in Fulcrum (which ultimately is using openssl) is behaving differently here and confusing Python. I believe it is not responding to the close-notify in all cases, hence the delay before timeout? You'll notice it's considered widespread practice not to respond to that message immediately... and they changed the spec actually to reflect what many implementations do.
From the above mentioned stack exchange thread:
OpenSSL, and thus implementations based on OpenSSL (clients and server), tend to no longer send
close_notify messages; they just drop the connection. The main reason is that in existing Web servers
and clients, connections are managed in a pool that closes them after some inactivity delay; that pool
manages low-level sockets and has no notion of what SSL could be; thus, the SSL layer has no way
to send (or wait for) an explicit close_notify.
I really do think this is just an unfortunate mix of software following specs differently, specs changing, and some person on the Python asyncio team not fixing bugs as they should.
I do believe the workaround you have developed in Electrum is sufficient and good. You can't hang the app waiting for TLS packets. Isn't Electrum trustless :) ? It's weird if a server can hang you like that anyway -- regardless of Fulcrum's TLS stack.
I'd like to fix this but unfortunately I am not going to rip apart the C code this is all based on to change how TLS is implemented on my side -- as much as i'd like to do that -- it's outside the scope of my application and not worth it. Especially since the workaround you have developed is actually something you should have in your app anyway.
I am going to conclude that the workaround you have developed in Electrum is actually a good idea regardless. And recommend you continue to do it.
Leaving this issue open though, just in case anybody has some bright ideas here.
An asyncio python client using
asyncio.BaseTransport.close()
never manages to close a transport (with a Fulcrum server); it must abort it usingasyncio.WriteTransport.abort
.We have already talked about this on IRC some time ago, I just wanted to log the issue so it can easily be referenced.
See IRC log: