amphp / http-client

An advanced async HTTP client library for PHP, enabling efficient, non-blocking, and concurrent requests and responses.
https://amphp.org/http-client
MIT License
702 stars 67 forks source link

Segmentation fault (11) when opcache is activated #365

Open unglaublicherdude opened 1 month ago

unglaublicherdude commented 1 month ago

Hi,

tbh, I don't know if I am correct here, but my research showed me earlier issues with interactions with opcache and revolt/event-loop so I hope you might be able to help us figure out this issue.

We have developed an AV SDK and at one point we decided to go async because of connection issues (basically using two connection one websocket and an https connection for an upload). So the first step we did is replacing our old Guzzle http-client with the amphp one replacing our upload feature.

With this SDK we build a nextcloud-app and ran into an issue that (for now, because its not released with that feature) only exists in our pipeline.

The setup locally and in the CI are almost the same, the tests that fail are run in a container but the behaviour differs. Locally everything works fine, but in the CI we always get the "Empty reply from server" error, what basically means the server did not answer. My assumption was that something (like a repeat or a delay) might sill be active in the eventloop but that would also have caused the issue on our local setup.

We could mitigate the issue by disabling the opcache via opcache.blacklist_filename and adding the whole amphp path to the blacklist, but that is almost not an option for the nextcloud app, because the blacklist has to have full-pathes and the customer would have to write it by semself.

I am basically helpless right now. Do you have any Idea, maybe at the SDK code, where use the http-client or some runtime-trick to prevent opcache from caching the amphp directory completely?

I also opened this issue on the revolt repository.

unglaublicherdude commented 1 month ago

The image we are using has php version 8.2.21 installed.

unglaublicherdude commented 1 month ago

Ok. To further explain. This CI runs a nextcloud container with an App that we developed.

The test does a put against the nextcloud instance to upload a file. While the upload our App checks against our backend if the file contains a virus, this is done via http PUT. Previously we used Guzzle for the http PUT but we exchanged it for amphp which has revolt as a dependency.

The code in question is this

                $cancellation = new DeferredCancellation();
        $connection = $this->_vaasConnection->GetAuthenticatedWebsocket();
        $pingTimer = EventLoop::repeat(5, function () use ($connection) {
            if ($connection->isConnected()) {
                $connection->ping();
            }
        });

        try {
            $httpClient = (new HttpClientBuilder())
                ->skipAutomaticCompression()
                ->skipDefaultAcceptHeader()
                ->skipDefaultUserAgent()
                ->build();

            $request = new Request($url, 'PUT');
            $request->setProtocolVersions(["1.1"]);
            $request->setTransferTimeout($this->_uploadTimeoutInSeconds);
            $request->setInactivityTimeout($this->_uploadTimeoutInSeconds);
            $request->setBody(StreamedContent::fromStream($fileStream, $fileSize));
            $request->addHeader("Authorization", $uploadToken);

            $response = $httpClient->request(
                $request, new TimeoutCancellation($this->_uploadTimeoutInSeconds), $cancellation->getCancellation());
            if ($response->getStatus() > 399) {
                $reason = $response->getBody()->buffer($cancellation->getCancellation());
                throw new UploadFailedException($reason, $response->getStatus());
            }
        } catch (\Exception $e) {
            if ($e instanceof HttpException) {
                throw new UploadFailedException($e->getMessage(), $e->getCode());
            }
            throw new VaasClientException($e->getMessage());
        } finally {
            if (EventLoop::isEnabled($pingTimer)) {
                EventLoop::cancel($pingTimer);
            }
            if (!$cancellation->isCancelled()) {
                $cancellation->cancel();
            }
        }

Looking at the CI we can see, that the request runs into a segmentation fault: image

This only happens, when opcache is activated and does not happen, when using Guzzle as http-client. But I also did not manage to reproduce this bug locally.

I did another test, removing the EventLoop::repeat. The segmentation fault keeps coming .

unglaublicherdude commented 4 weeks ago

Here you can find the coredumps from the apache child processes: https://github.com/GDATASoftwareAG/nextcloud-gdata-antivirus/pull/98#issuecomment-2291117317

And in our Dockerfile we have an automated way to get the apache2 debug-symbols for our version: https://github.com/GDATASoftwareAG/nextcloud-gdata-antivirus/blob/renovate/major-all-major-patch/Dockerfile.Nextcloud#L8-L15

unglaublicherdude commented 4 weeks ago

With dump_bt we get this

(gdb) dump_bt executor_globals.current_execute_data
[0x7efca1750180] version_compare(false, "0.3.0", "<") [internal function]
[0x7ffd3f49eeb0] Amp\File\Driver\UvFilesystemDriver::__construct() 
/var/www/html/apps/gdatavaas/vendor/amphp/file/src/Driver/UvFilesystemDriver.php:0
trowski commented 4 weeks ago

Hi @unglaublicherdude!

Thanks for the additional information. I saw this issue when you originally posted it, but was left scratching my head, as there isn't really enough in your initial post for me to have any idea what was happening. I believe "Empty reply from server" is from curl, not amphp/http-client, so I'm not sure what's going on there.

[0x7efca1750180] version_compare(false, "0.3.0", "<") [internal function]

I find this line unusual, since this shouldn't be happening. This could be the result of a bug in opcache optimizing out something which it should not have. false should never be passed to version_compare. This is in amphp/file. I pushed branch, https://github.com/amphp/file/tree/opcache-bug, which changes the code a bit to potentially avoid a bug, if there is one. Could you change your dependency in composer.json for amphp/file to dev-opcache-bug and see if the fix works?

unglaublicherdude commented 4 weeks ago

Thank you for the reply. And sorry for my initial post. It was a bit of braindumping.

Thank you for the offered solution. It still does not work, but at least now it seems to be an issue with our app or nextcloud so I can go on debugging that.

(gdb) dump_bt executor_globals.current_execute_data
[0x7f445741c2d0] OC\AppFramework\Utility\SimpleContainer->query("OCA\Files_Trashbin\Trash\ITrashManager", false) 
/var/www/html/lib/private/AppFramework/Utility/SimpleContainer.php:153 
[0x7f445741c200] OC\ServerContainer->query("OCA\Files_Trashbin\Trash\ITrashManager") 
/var/www/html/lib/private/ServerContainer.php:145 
[0x7f445741c0d0] OCA\GDataVaas\AvirWrapper->OCA\GDataVaas\{closure}() /var/www/html/apps/gdatavaas/lib/AvirWrapper.php:177 
[0x7f445741c060] call_user_func(object[0x7f445741c0b0]) [internal function]
[0x7f445741bfe0] Icewind\Streams\CallbackWrapper->stream_close() /var/www/html/3rdparty/icewind/streams/src/CallbackWrapper.php:119 
[0x7f445741bf70] fclose(resource(#1357862240)) [internal function]
[0x7f445741beb0] OC\Files\Storage\Local->writeStream("files/functionality-parallel.eicar.com.txt.ocTransferId1605683835.part", 
resource(#1357862240), NULL) /var/www/html/lib/private/Files/Storage/Local.php:652 
[0x7f445741bdd0] OC\Files\Storage\Wrapper\Wrapper->writeStream("files/functionality-parallel.eicar.com.txt.ocTransferId1605683835.part
", resource(#1357862240), NULL) /var/www/html/lib/private/Files/Storage/Wrapper/Wrapper.php:653 
[0x7f445741bcf0] OC\Files\Storage\Wrapper\Wrapper->writeStream("files/functionality-parallel.eicar.com.txt.ocTransferId1605683835.part
", resource(#1357862240), NULL) /var/www/html/lib/private/Files/Storage/Wrapper/Wrapper.php:653 
[0x7f445741bc50] OCA\GDataVaas\AvirWrapper->writeStream("files/functionality-parallel.eicar.com.txt.ocTransferId1605683835.part", 
resource(#1357862240), NULL) /var/www/html/apps/gdatavaas/lib/AvirWrapper.php:108 
[0x7f445741bb70] OC\Files\Storage\Wrapper\Wrapper->writeStream("files/functionality-parallel.eicar.com.txt.ocTransferId1605683835.part
", resource(#1465903520)) /var/www/html/lib/private/Files/Storage/Wrapper/Wrapper.php:653 
--Type <RET> for more, q to quit, c to continue without paging--
[0x7f445741b900] OCA\DAV\Connector\Sabre\File->put(resource(#1465559264)) /var/www/html/apps/dav/lib/Connector/Sabre/File.php:250 
[0x7f445741b7e0] OCA\DAV\Connector\Sabre\Directory->createFile("functionality-parallel.eicar.com.txt", resource(#1465559264)) 
/var/www/html/apps/dav/lib/Connector/Sabre/Directory.php:148 
[0x7f445741b6e0] Sabre\DAV\Server->createFile("files/admin/functionality-parallel.eicar.com.txt", reference, reference) 
/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php:1098 
[0x7f445741b5b0] Sabre\DAV\CorePlugin->httpPut(object[0x7f445741b600], object[0x7f445741b610]) 
/var/www/html/3rdparty/sabre/dav/lib/DAV/CorePlugin.php:504 
[0x7f445741b4c0] Sabre\DAV\Server->emit("method:PUT", array(2)[0x7f445741b520]) 
/var/www/html/3rdparty/sabre/event/lib/WildcardEmitterTrait.php:89 
[0x7f445741b3f0] Sabre\DAV\Server->invokeMethod(object[0x7f445741b440], object[0x7f445741b450]) 
/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php:472 
[0x7f445741b2e0] Sabre\DAV\Server->start() /var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php:253 
[0x7f445741b280] Sabre\DAV\Server->exec() /var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php:321 
[0x7f445741b1f0] OCA\DAV\Server->exec() /var/www/html/apps/dav/lib/Server.php:383 
[0x7f445741b150] (main) /var/www/html/apps/dav/appinfo/v2/remote.php:35 
[0x7f445741b020] (main) /var/www/html/remote.php:172 
trowski commented 3 weeks ago

Look like you managed to find another bug in PHP! As you said, at least this one seems to be happening in your code so hopefully you have an easier time diagnosing it.

I've tagged 3.1.1 of amphp/file, so you should be able to revert your change to composer.json and get the new version with the fix.