aerospike / php-client

Aerospike Client for PHP 8
https://aerospike.github.io/php-client/
9 stars 2 forks source link

Why C library wrapper was replaced by grpc service and connection manager approach? #46

Closed RokasBalevicius closed 2 weeks ago

RokasBalevicius commented 1 month ago

Hi,

Context At the moment we are upgrading some of our PHP apps to the latest version of PHP and as well moving them out to Swoole and/or ReactPHP. One of the teams has tried to use the new driver but has run into some stability issues. It might be a user error and I have not clarified the details, but due to that, I was put on a task to figure out how to best use Aeropsike from modern long-running PHP. I'm not a PHP developer, but I work with high'ish perf stuff in my company with other languages, hence I got the task.

New client vs old client I can see that the old driver was a wrapper around a C library (which is an obvious solution) and the new one is connection manager + thin grpc client. I was not able to find any additional detailed info on why such a switch was made. Can you explain the rationale in a few lines?

Long running process and connection pooling I assume that this switch was done not only because maintaining a wrapper is a pain, but also because the new approach allows for proper connection pooling under PHP-FPM. As we will be using a persistent long-running PHP process, it seems like a C wrapper would be the best solution for us. A C client would pool the connections (?), and a direct wrapper would avoid GRPC overhead. Is such an assumption correct?

Why not call grpc directly from PHP? Why does the new client not call GRPC directly from PHP via the generated client code? Connection manager would still be in place, but there would be no Rust layer in between and no need to maintain a plugin. I mainly assume that this is because Rust makes the calls faster than the normal PHP grpc clients. Is this correct, or is only done for easier consumption of the library?

Essentially I'm trying to figure out how to proceed. We have a few options floating around: 1) Use this client and figure out why we had the issues (and ofc contribute to GO code if need be). This leaves GRPC overhead and extra moving parts, which we are not too happy about. 2) Make our own wrapper around C and expose the methods we need (only a few CRUD methods). I assume somehow this is a bad idea, but I do not see exactly why at the moment. I assume that bridging is not trivial and we could not easily carry over the v7 client code (so lots of work ?). Maybe you can add some remarks, about why it is a stupid idea?

Any input is appreciated.

khaf commented 1 month ago

Hi, thank you for your feedback.

Can you explain the rationale in a few lines?

You say yourself in the next paragraph. The new client was born out of our frustrations with PHP's API changes and lack of documentation. We initially implemented it in Rust, but then we found out that due to php-fpm, the clients will exhaust the connections on the server side. The connection manager pools and shares these connections, hence that issue is resolved.

a direct wrapper would avoid GRPC overhead. Is such an assumption correct?

No. php-fpm runs multiple processes in the background and keeps them around, each with their own connection pools. Most of these will be idle most of the time wasting precious resources, especially connections.

Why does the new client not call GRPC directly from PHP via the generated client code?

It is slower. You can still do it yourself if you so prefer. The manifest is in the source code, and writing a thin grpc wrapper is trivial.

  1. Use this client and figure out why we had the issues (and ofc contribute to GO code if need be). This leaves GRPC overhead and extra moving parts, which we are not too happy about.

The GRPC overhead is less than the PHP language itself. In our benchmarks, even on slow cloud servers, the response time from the PHP side is reliably under 1ms. The current architecture was requested by our customers, and the client was redone from pure Rust to this architecture due to those requests. It has also the benefit of the new features being available immediately after release in the Go client which is a huge win.

  1. Make our own wrapper around C and expose the methods we need

PHP is barely documented and the devs are not a friendly bunch. Maintaining the C wrapper code is costly and subject to PHP's changes.

You started with stability issues (without providing any details) and pivoted into performance. I can help you more if you provide concrete examples.

RokasBalevicius commented 1 month ago

You started with stability issues (without providing any details) and pivoted into performance. I can help you more if you provide concrete examples.

I guess I got a little ahead of myself here. It just so happens that we are in the very early phase of that project, and where are a few other early-stage projects where we re looking into solving issues by custom wrappers around native code for PHP. So I figured that "these guys decided to stop using wrapper, maybe I can get some insights why".

I will get back with the stability issues, once we do a proper investigation, as to avoid wasting your time. I still suspect it's an issue on our side.

No. php-fpm runs multiple processes in the background and keeps them around, each with their own connection pools. Most of these will be idle most of the time wasting precious resources, especially connections.

I was more interested in a use case where we do not use php-fpm, but rather something like Swoole. In the case of Swoole, we have a persistent memory long-running process with an event loop (think node.js). My assumption is that in that case C library connection polling would work as expected (?), as where is one process and memory is persistent.

Thank you for your input, it adds some extra perspective on the whole "just solve it via C wrapper" conundrum.

khaf commented 2 weeks ago

Sorry for my late reply, this comment got lost in my bursting mailbox. In theory, a C wrapper client should work in an architecture that runs a single instance of PHP VM. I do not know if that's the case with Swoole, and whether there's a sizable customer base for it.

The issue with a single PHP server like Swoole is that the existing PHP code is written with the assumption that the VM is refreshed after the request cycle is over. I don't know if that's something that is easily workable for a lot of folks.

Please don't hesitate to reach out if you have any further questions. I'm usually more responsive :)