Open dotsimon opened 6 years ago
Thanks, I'll take a look.
How do we duplicate the issue? How do we test the suggested fix?
Hi Steve. Sorry for not including test cases, but this was found in a regression test where yaws is only a small part. But consider this case where an embedded server is started with one appmod: a. The gserv process starts an acceptor (acc1), therefore in gserv_loop Last = acc1 and Ready = [] A request to /a/... is made. acc1 sends next to gserv which starts a new acceptor (acc2) such that now Last = acc1 and Ready = [] acc1 calls a:out/1 which in turn updates the sconf to add appmod b gserv calls stop_ready which kills acc2 and then starts a new acceptor acc3 acc1 eventually finishes the request and sends done_client to gserv which adds the pid to Ready. A request now to /b/... gets handled by acc3 ok The next request to /b/ gets 404:ed by acc1 because it still has the old sconf
A variation on this is that accX has already served a request and is in gen_tcp:accept and in gserv's Ready, So in the original code accX still has trap_exit true making the exit in stop_ready ineffective and accX can therefore serve one more request (with the old sconf) before actually dying.
A worse case is like the above but accX is Last eventually dying due to a previous queued EXIT leaving gserv in a state where it thinks accX is waiting in accept but actually there are no processes to serve any requests!
My fixes were:
In doing this, I did consider if having a client list instead of relying on process_info(links) would be better. And also if there was a point to sending Pid ! {Top, stop} when the exit(shutdown, Pid) had to be handled in the same way anyway in acceptor0 and aloop.
But, since I don't have a standalone yaws or (sadly) the time to develop test cases I did not want to make any significant reworking. For the same reasons, I did not make a PR from the sconf_race branch.
I hope this has been somewhat helpful. Please let me know if I can help clarify anything.
/Simon
Thanks for the details, Simon. This information looks like it will be helpful.
When changing sconf, acceptors that are neither Ready nor Last can continue to receive and process requests using the old sconf.
If appmods and/or opaque are changed then this could lead to quite incorrect responses. On busy servers, these old acceptors might never get killed by the periodic cleanup.
https://github.com/dotsimon/yaws/tree/sconf_race suggests a solution that preserves the "soft-close" but should ultimately terminate those old acceptors.