OpenFastPath / ofp

OpenFastPath project
BSD 3-Clause "New" or "Revised" License
349 stars 126 forks source link

OFP TCP Keepalive Timer is not working as expected due to keepalive count (t_keepcnt) is not incremented and validated while processing Keepalive Timer #280

Open manishmatey opened 1 year ago

manishmatey commented 1 year ago

Hi Team,

When a TCP connection is established, Ideally TCP connection should get terminated when there is no data exchange between client and server for sometime. TCP will start keepalive timer and if no data exchange happened between client and server for few keepalive packets (ideally 8 to 10) then TCP connection will be dropped.

I did not find anywhere TCP keepalive count is getting incremented and checked while processing TCP keepalive timer. Can somebody confirm if this a bug in OFP code itself? OR Where TCP keepalive count is getting incremented in OFP TCP code?

Any help will be appreciated.

bogdanPricope commented 1 year ago

Hi,

My understanding is that it is validating the time spent (in ticks) versus the max to wait:

ofp_tcp_timer.c ofp_tcp_timer_keep() { ...... if ((always_keepalive || (inp->inp_socket->so_options & OFP_SO_KEEPALIVE)) && tp->t_state <= TCPS_CLOSING) { if ((int)(ofp_timer_ticks(0) - tp->t_rcvtime) >= TP_KEEPIDLE(tp) + TP_MAXIDLE(tp)) goto dropit; .........................

I don't use the same codebase ... but what are the values of always_keepalive and inp->inp_socket->so_options for you?

Best regards, Bogdan

P.S. Please have a look at my work on NFP (my version of ofp): http://www.netinosoft.org Feedback will be much appreciated.

manishmatey commented 1 year ago

Hi @bogdanPricope

Please check the below values for variables (as requested)
always_keepalive = 1 and inp->inp_socket->so_options =0

In my case, Below if condition never gets TRUE and TCP is resetting the keepalive timer forever.

if ((int)(ofp_timer_ticks(0) - tp->t_rcvtime) >= TP_KEEPIDLE(tp) + TP_MAXIDLE(tp))

I have printed all 4 variables below : [Ticks=775071425 - Recv Time=775065425] = 6000 [Keepidle=720000 + Maxidle=60000] = 780000

if conditions like this 6000 >= 780000 will never gets TRUE soon as Ticks value and Recv Time is keep on incrementing and here comparison with big constant value 780000.

Regards, Manish

bogdanPricope commented 1 year ago

Hi @manishmatey

I did a little experiment: I changed:

I am getting this: I 1007 0:788517888 httpd.c:173] accept fd=1 Ticks: 6971, t_rcvtime: 1007 -> 5964 vs 46000 Ticks: 8991, t_rcvtime: 1007 -> 7984 vs 46000 Ticks: 11011, t_rcvtime: 1007 -> 10004 vs 46000 Ticks: 13031, t_rcvtime: 1007 -> 12024 vs 46000 Ticks: 15051, t_rcvtime: 1007 -> 14044 vs 46000 Ticks: 17071, t_rcvtime: 1007 -> 16064 vs 46000 Ticks: 19091, t_rcvtime: 1007 -> 18084 vs 46000 Ticks: 21111, t_rcvtime: 1007 -> 20104 vs 46000 Ticks: 23131, t_rcvtime: 1007 -> 22124 vs 46000 Ticks: 25151, t_rcvtime: 1007 -> 24144 vs 46000 Ticks: 27171, t_rcvtime: 1007 -> 26164 vs 46000 Ticks: 29191, t_rcvtime: 1007 -> 28184 vs 46000 Ticks: 31211, t_rcvtime: 1007 -> 30204 vs 46000 Ticks: 33231, t_rcvtime: 1007 -> 32224 vs 46000 Ticks: 35251, t_rcvtime: 1007 -> 34244 vs 46000 Ticks: 37271, t_rcvtime: 1007 -> 36264 vs 46000 Ticks: 39291, t_rcvtime: 1007 -> 38284 vs 46000 Ticks: 41311, t_rcvtime: 1007 -> 40304 vs 46000 Ticks: 43331, t_rcvtime: 1007 -> 42324 vs 46000 Ticks: 45351, t_rcvtime: 1007 -> 44344 vs 46000 Ticks: 47371, t_rcvtime: 1007 -> 46364 vs 46000 tcp drop returned: 0x41c07048!!!

That is, the connection was dropped after the specified time. Note that t_rcvtime remains constant as there is no traffic from the other side....

Now, you may add similar log messages and check if the connection is dropped in your case.

manishmatey commented 1 year ago

Hi @bogdanPricope

I have few queries on the above test : 1) Is t_rcvtime will not get updated for Keepalive response packet received ?

2) Are you dropping keepalive packet responses in above test ?

Regards, Manish

bogdanPricope commented 1 year ago

Hi @manishmatey

My understanding is that this the case were the remote device becomes unaccessible due to connectivity issue or remote has crashed. To simulate this case I shut down the network interface of the remote device.. that is, there are NO keepalive responses (received or sent).

I don't understand your point: you are actively using keepalive mechanism to keep the connection up ... if you have responses it means the connection is up and should not be terminated...

manishmatey commented 1 year ago

HI @bogdanPricope ,

Thanks for the reply so below is my understanding : When there is no reply of TCP keepalive packets then based on below if condition TCP connection will be dropped

if ((int)(ofp_timer_ticks(0) - tp->t_rcvtime) >= TP_KEEPIDLE(tp) + TP_MAXIDLE(tp))

and keepalive count variable t_keepcnt is not used anywhere to drop the TCP connection. Currently I am seeing TCP keepalive packets are getting exchanged in every 1 minute due to condition [if (delta > 6000) delta = 6000;]. Please check the code below:

File : ofp_tcp_timer.c Function : ofp_tcp_timer_activate(struct tcpcb *tp, int timer_type, uint32_t delta) case TT_KEEP: if (delta > 6000) delta = 6000; t_callout = &tp->t_timers->tt_keep; f_callout = ofp_tcp_timer_keep; break;

If I comment the above if condition [if (delta > 6000) delta = 6000;] then keepalive probe starting after 2 hours. Is this condition is having issue? I have just commented this code as workaround.

Is this above if condition [if (delta > 6000) delta = 6000;] has issue? Regards, Manish Tiwari

bogdanPricope commented 1 year ago

Hi @manishmatey

I get that in your case the remote side is still accessible ... it just stopped sending "payload" data.

What you are seeing is that keepalive messages are sent every minute despite having a TCP_KEEPIDLE ("The time (in seconds) the connection needs to remain idle before TCP starts sending keepalive probes") of 120 minutes. This is (as far as I understand) a bug (probably caused by an workaround to the fact that OFP was not supporting long timers). This 'if (delta > 6000) delta = 6000;' is not in FreeBSD ... so, probably is an OFP addition...

However, even without he unwanted keepalives the behavior will not change: after 120 minutes (TCP_KEEPIDLE) OFP will send a keepalive and the remote side will answer: t_rcvtime will be updated and connection will not be dropped.

I am not a TCP expert but my understanding is that this is the expected behavior for the keepalive mechanism..... (is not meant to monitor TCP payload traffic but only if the remote is alive and on the remote side the connection is still active (was not closed, etc.))

Regards, Bogdan