amphp / process

An async process dispatcher for Amp.
MIT License
233 stars 30 forks source link

Moderate to high concurrency load causes "Failed to launch process" errors on Windows #21

Closed Bilge closed 6 years ago

Bilge commented 6 years ago

Using Artax (composer require amphp/artax) and running the following code on Windows:

        Loop::run(function () {
            foreach (range(1, 1000) as $id) {
                $this->client->request('http://example.com')->onResolve(function ($error, $response) {
                    if ($error) {
                        echo "$error\n";

                        return;
                    }
                });
            }
        });

Generates the following error:

Amp\Process\ProcessException: Process did not connect to server before timeout elapsed in vendor\amphp\process\lib\Internal\Windows\SocketConnector.php:344

The key is to have a high concurrency limit (i.e. 1000 in this case). Lower limits can trigger trigger it, too, but they need to be at least in the region of ~500 on my Windows 7 system.

The error is strange because only a couple of seconds are passing, at most, before the errors start streaming in. The same code works (can successfully download a response) with lower concurrency limits (e.g. around 200 or so).


Edit: After further testing, I can trigger this with a limit as low as 50. It seems to depend on how long the destination server takes to respond. Loading larger pages is more likely to cause it to fail. Again, only a second or so of real user time is passing but even trying to load 50 heavy URLs at once will cause it to fail the majority of the time and there seems to be no way to override the timeout.

kelunik commented 6 years ago

The number of concurrent requests shouldn't affect this, nor should the size of the responses. Another process should only be used for reading the hosts file on Windows and for reading several registry entries for the DNS configuration.

Bilge commented 6 years ago

The number of concurrent requests shouldn't affect this

I'm sure you're correct, but empirically you're wrong. If $limit is the upper bound of the range() then I can tell you I've clearly observed that:

So whether it betrays your expectations or not, the number of concurrent requests directly affects this. Moreover, changing the timeout from one second to 30 seconds, as in the linked PR, causes all three cases to pass without timeout errors.

kelunik commented 6 years ago

I'm just referring to what the code should do, not what it will do. I didn't say I'm sure it's bug-free. It can totally be that too many processes are spawned for example, and in that case such a timeout is pretty much expected.

Could you put some logging code into the runner and see which commands are launched?

Bilge commented 6 years ago

Could you put some logging code into the runner and see which commands are launched?

I'm not sure what you're asking me to do. Could you be more specific?

kelunik commented 6 years ago

Put e.g. fwrite(STDERR, "command: $command\r\n"); in the line after the line I linked and run the reproduction script.

Bilge commented 6 years ago

One important thing I should point out is that I just tried running the script with Xdebug disabled and it passes for $limit circa 500. It is possible that Xdebug is the cause of this issue.


Edit: Never mind. $limit == 1000 still times out, even with Xdebug turned off. Disabling Xdebug seems to afford it more "bandwidth" but does not fix the issue.

Bilge commented 6 years ago

@kelunik The only thing that shows up when it fails due to every request timing out is the following, just after the first request is made.

command: reg query "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"

limit108.txt


In the case that the $limit is low enough for the requests to pass and not time out, the following additional output is shown, after all the requests have been queued and before the first response is received:

command: reg query "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters" command: "C:\PHP 7.1\php.exe" -dhtml_errors=0 -ddisplay_errors=0 -dlog_errors=1 "vendor\amphp\parallel\lib\Context/Internal/process-runner.php" "vendor\amphp\parallel\lib\Worker/Internal/worker-process.php" "Amp\Parallel\Worker\BasicEnvironment"

limit54.txt

kelunik commented 6 years ago

@Bilge Does increasing the timeout to 3 seconds already work for very high concurrency loads?

Bilge commented 6 years ago

Yes, but only up to a point. I can get ~500 concurrent with a 3 second timeout, but not ~1000.

kelunik commented 6 years ago

@Bilge Do you have a profiler like Blackfire available? If so, could you please share a profile with 1 and 1000 requests?

Bilge commented 6 years ago

Happy to help in any way I can but I don't know how to do that. I understand most cool hackers don't use Windows, but there's only so much I can do via GitHub comments. Ultimately an expert such as yourself is probably going to need direct access to Windows to get to the bottom of a fundamentally complex or esoteric issue such as this.

kelunik commented 6 years ago

See https://blackfire.io/docs/introduction for an introduction. If you can't / don't want to do that, that's entirely fine. Just be aware that it will delay this issue's resolution. I think you can do similar things with xdebug.

It depends a lot on the root cause of the issue. If it's an issue in the process wrapper, @DaveRandom will indeed need to look into this. If it's not, a profile might be enough to tell where the issue is.

kelunik commented 6 years ago

@Bilge Don't worry, @pcrov was kind enough to provide a profile: https://blackfire.io/profiles/ac370ddb-e112-4941-bfe5-bb8171c968cc/graph