Closed OTP-Maintainer closed 3 years ago
JIRAUSER12907
said:
After some more investigation, we got a little bit further. We took a packet capture on the VM where the Erlang DTLS and OpenSSL server run, and we did some tracing on {{dtls_packet_demux}} and the process and port of the DTLS socket. The issue doesn't reproduce every time and we have seen some variations of the problem, but in this particular scenario, the Erlang DTLS server is eventually neither able to send nor receive, while the OpenSSL {{s_server}} keeps working.
It looks like Kubernetes changed the source IP of the client, which is visible in the capture (the port of the Erlang DTLS server is 49002). The source IP changed from {{10.240.0.5}} to {{10.244.0.1}}.
!2019-12-12-153121_1748x779_scrot.png|width=1104,height=492!
In the trace, the packet can be seen coming ind, but it is dropped in {{handle_datagram/3}} since the IP and source port cannot be found in the set of clients in the {{dtls_packet_demux}} state.
{code}
13:25:27.891966 {:trace, #Port<0.6>, :send, {:udp, #Port<0.6>, {10, 244, 0, 1}, 55764, <<23, 254, 253, 0, 1, 0, 0, 0, 0, 0, 11, 0, 28, 55, 62, 95, 167, 248, 9, 18, 86, 218, 4, 44, 135, 148, 147, 109, 0, 76, 24, 57, 138, 65, 212, 70, 70, 191, 128, 95, 66>>}, #PID<0.135.0>}
13:25:27.896477 {:trace, #PID<0.135.0>, :receive, {:udp, #Port<0.6>, {10, 244, 0, 1}, 55764, <<23, 254, 253, 0, 1, 0, 0, 0, 0, 0, 11, 0, 28, 55, 62, 95, 167, 248, 9, 18, 86, 218, 4, 44, 135, 148, 147, 109, 0, 76, 24, 57, 138, 65, 212, 70, 70, 191, 128, 95, 66>>}}
13:25:27.896632 {:trace, #PID<0.135.0>, :call, {:dtls_packet_demux, :handle_info, [{:udp, #Port<0.6>, {10, 244, 0, 1}, 55764, <<23, 254, 253, 0, 1, 0, 0, 0, 0, 0, 11, 0, 28, 55, 62, 95, 167, 248, 9, 18, 86, 218, 4, 44, 135, 148, 147, 109, 0, 76, 24, 57, 138, 65, 212, 70, 70, ...>>}, {:state, 49002, #Port<0.6>, {:gen_udp, :udp, :udp_closed, :udp_error}, {:ssl_options, :dtls, [{254, 253}], :verify_none, {#Function<8.45162026/3 in :ssl.handle_verify_options/2>, []}, #Function<9.45162026/1 in :ssl.handle_verify_options/2>, false, false, :undefined, 1, "/cert/ssl.crt", :undefined, "/cert/ssl.key", :undefined, [], :undefined, "/cert/ssl.crt", :undefined, :undefined, :undefined, :undefined, :undefined, [<<192, 44>>, <<192, 48>>, <<192, 36>>, <<192, 40>>, <<192, 46>>, <<192, 50>>, <<192, 38>>, <<192, 42>>, <<0, 159>>, <<0, 163>>, <<0, 107>>, <<0, ...>>, <<...>>, ...], #Function<4.45162026/4 in :ssl.handle_reuse_session_option/3>, true, 268435456, true, true, :infinity, false, :undefined, :undefined, :undefined, :undefined, true, :undefined, ...}, {:socket_options, :binary, 0, 0, 0, false}, {1, {{{10, 240, 0, 5}, 55764}, {[#PID<0.141.0>], []}, nil, nil}}, {1, {{{10, 240, 0, 5}, 55764}, nil, nil}}, {1, {#PID<0.141.0>, {{10, 240, 0, 5}, 55764}, nil, nil}}, {[], []}, false, false}]}, {:gen_server, :try_dispatch, 4}}
{code}
ingela
said:
Humm ... so I guess there must be a way to detect that the client changes its source IP but still is considered to be the same virtual connection ?! Any insights are welcome, currently working mostly with TLS-1.3
ingela
said:
I was thinking maybe there is some kind of timing problem where the old connection is not quite closed when the client tries to start a new connection ... as far as I can remember there is no way to have the client being able to change its IP.
ingela
said:
I just fixed a DTLS "listen socket" emulation in ERL-1118 and I am curios if it could improve this issue too! See PR 2504
https://github.com/erlang/otp/pull/2504
JIRAUSER12907
said:
After some further investigation, I don't really think this should be fixed at OTP level. It seems like this is expected behavior of Kubernetes, which is not really compatible with DTLS.
From https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-type-loadbalancer :
bq. As of Kubernetes 1.5, packets sent to Services with Type=LoadBalancer are source NAT’d by default, because all schedulable Kubernetes nodes in the Ready state are eligible for loadbalanced traffic. So if packets arrive at a node without an endpoint, the system proxies it to a node with an endpoint, replacing the source IP on the packet with the IP of the node (as described in the previous section).
I doubt the fix to ERL-1118 will make a difference, but we appreciate the fix.
ingela
said:
So this particular issue seems not to be an issue with the OTP implementation so I will close this for now. If you find some problem related you are of course welcome to reopen or create a new issue which ever seems most appropriate.
Original reporter:
JIRAUSER12907
Affected version:OTP-22.1
Component:ssl
Migrated from: https://bugs.erlang.org/browse/ERL-1112