UlricE / pen

Pen
Other
250 stars 41 forks source link

Possible pending_queue leak #43

Closed ifduyue closed 5 years ago

ifduyue commented 6 years ago

Hello,

When I was using pen as a proxy, after a while I noticed that pen stopped accepting new connections and stuck in a busy loop, and consumed a lot of CPU resource. Restarting pen solves this issue temporarily, after a while pen will get stuck and stop working again.

2018-06-25 19:01:03: epoll_event_fd(revents=0x7ffc8b1bc464)                     
2018-06-25 19:01:03: epoll_event_wait()                                        
2018-06-25 19:01:03: epoll_wait returns 1                                       
2018-06-25 19:01:03: After event_wait()                                         
2018-06-25 19:01:03: epoll_event_fd(revents=0x7ffc8b1bc464)                     
2018-06-25 19:01:03:    epoll_ev[0] = {revents=1, data.fd=4}                    
2018-06-25 19:01:03: event_fd returns fd=4, events=65536                        
2018-06-25 19:01:03: check_listen_socket()                                      
2018-06-25 19:01:03: accepted 0 connections                                     
2018-06-25 19:01:03: epoll_event_fd(revents=0x7ffc8b1bc464)                     
2018-06-25 19:01:03: epoll_event_wait()                                         
2018-06-25 19:01:03: epoll_wait returns 1                                       
2018-06-25 19:01:03: After event_wait()                                         
2018-06-25 19:01:03: epoll_event_fd(revents=0x7ffc8b1bc464)                     
2018-06-25 19:01:03:    epoll_ev[0] = {revents=1, data.fd=4}                    
2018-06-25 19:01:03: event_fd returns fd=4, events=65536                        
2018-06-25 19:01:03: check_listen_socket()                                      
2018-06-25 19:01:03: accepted 0 connections                                     
2018-06-25 19:01:03: epoll_event_fd(revents=0x7ffc8b1bc464)                     
2018-06-25 19:01:03: epoll_event_wait()                                         
2018-06-25 19:01:03: epoll_wait returns 1 
[... loop ...]

After some source code reading, I guess I've found a possible pending_queue leakage:

-> mainloop()
   -> event_wait()
   -> handle_events(...)
      -> check_listen_socket(...)
         -> downfd = accept_nb(listenfs, (struct sockaddr *)&cli_addr, &clilen)
         -> add_client(downfd, &cli_addr)
            -> client = store_client(cli_addr)
            -> conn = store_conn(downfd, client)
            -> try_server(conns[conn].initial, conn) = 1
               -> upfd = socket_nb(...)
               -> err == CONNECT_IN_PROGRESS
               -> pending_queue++

   [...]

   -> event_wait()
   -> handle_events(...)
      -> events & EVENT_ERR = 1
      -> conns[conn].state = CS_CLOSED      # here conns[conn].upfd equals to upfd above
      -> pending_close[npc++] = conn
   -> pending_and_closing(...)
      -> closing_time(conn) = 1
      -> close_conn(conn)
         -> if (conns[i].state == CS_IN_PROGRESS) {
                pending_queue--
            }
         -> ### but here conns[i].state equals to CS_CLOSED
         -> ### pending_queue leaks
         -> ### when pending_queue >= pending_max, pen stops accepting new connections
ifduyue commented 6 years ago

Another possible pending_queue leak is in function failover_server at server.c https://github.com/UlricE/pen/blob/master/server.c#L241-L244 If conns[conn].state is CS_IN_PROGRESS, failover_server just closes upfd without dealing with pending_queue.