eddic / fastcgipp

fastcgi++: A C++ FastCGI and Web development platform:
https://fastcgipp.isatec.ca
GNU Lesser General Public License v3.0
310 stars 94 forks source link

Poll gave fd xx which isn't in m_sockets #28

Closed ZhangzheBJUT closed 6 years ago

ZhangzheBJUT commented 7 years ago

I came across this problem sometimes. Jul 10 00:00:03 machinename /fastcgidemo [1452] [error]: Poll gave fd 26 which isn't in m_sockets. It was likely that socket gourps was in disorder.i want to know what can lead to this error and how to solve.

Erroneous1 commented 6 years ago

Maybe this will help a bit with the problem. I added debug statements to see if this error happens in the middle of a request or after and it looks like it only happens after returning true from response().

May 18 15:59:49 localhost iw_work[5679]: iw_libsrv/src/request.cpp@74: response() start
May 18 15:59:49 localhost iw_work[5679]: iw_libsrv/src/request.cpp@78: response() returning false
May 18 15:59:49 localhost iw_work[5679]: iw_libsrv/src/request.cpp@64: response() finished
May 18 15:59:49 localhost iw_work[5679]: iw_libsrv/src/request.cpp@66: response() returning true
May 18 15:59:49 localhost iw_work[5679]: May 18 15:59:49 localhost /usr/bin/iw_work[5679] [error]: Poll gave fd 7 which isn't in m_sockets.

The last line is repeated a total of 963 times.

I have an atomic enum that I set to either start/working/finished that gets set to finished before calling callback(). As you can see, it doesn't receive any new calls to response() after it returns true, and is then outputting all of those "Poll gave fd X" lines.

My guess is that something isn't quite closed.

eddic commented 6 years ago

Yeah. This is a tough one. I'd guess that the OS just isn't cleaning things up as quickly as fastcgi++ is expecting it to. I myself have never run into this issue but I suspect different system configurations have different socket cleanup strategies. In all fairness it's probably not worth even throwing an error message since the code tries again to fully clean things up. It's more just intended to notify you that, although everything is okay, something happened that was unexpected.

Are your systems Linux epoll() or posix select()?

Erroneous1 commented 6 years ago

I'm in Linux. My guess is that it's not sending the data, or not all of it. I don't know enough about how it works or about epoll but could it be that we still have data to send when we call Socket::close() and epoll_wait is reporting that data is still pending being sent or that the server is acknowledging the data being sent? When it happens the browser gets a 500 error and Apache gives me the following in error_log:

[Fri May 18 17:09:33.146196 2018] [fastcgi:error] [pid 30954:tid 140414342387456] [client 10.7.1.9:55502] FastCGI: incomplete headers (0 bytes) received from server "/srv/http/w", referer: https://10.7.1.10/w

My setup is Fastcgipp is listening on a TCP port and Apache w/ fastcgi_module is on a remote server on same lan. I'm running the Manager on a single thread and pushing all response handling to a thread pool.

eddic commented 6 years ago

Again, thanks @Erroneous1 for solving this one.

Erroneous1 commented 6 years ago

No problem. I'm in the process of updating my 2.1 code to 3.0. So far I am loving the improvements!

eddic commented 6 years ago

Ah glad to hear it! Let me know once you feel it is running smoothly and we'll do a 3.1 "Aaron" release with all your fixes.

ZhangzheBJUT commented 6 years ago

Excellent! i expect your good news! @Erroneous1