Implement the FastCGI protocol

devhaozi commented 3 months ago

What is the problem your feature solves, or the need it fulfills?

Implement the FastCGI protocol.

Describe the solution you'd like

Implement FastCGI protocol for PHP so that pingora can use PHP-FPM as backend.

Describe alternatives you've considered

Currently, Nginx and its derivatives support the FastCGI protocol, among which Nginx + FastCGI + PHP-FPM is the mainstream environment.

Additional context

https://en.wikipedia.org/wiki/FastCGI https://www.php.net

plugd-in commented 2 months ago

I was able to do this by implementing ServeHttp and creating a listening service for my new struct.

I used the fastcgi_client crate, passed along the headers over FastCGI, parsed the headers from the response (using httparse), and then sent over the rest of the response body in the Response<Vec<u8>>.

For reference, I ran a test using "siege" and outperformed Nginx by 33% (23,388 vs 17,454) based on transactions/sec, although I spent a lot more effort trying to optimize my implementation for my hardware, so the test was biased.

I suspect, however, that there was probably a better way to do this than implementing ServeHttp, just that this way was the most straightforward.

plugd-in commented 2 months ago

I was also able to get streaming responses and keep-alive connections to work, too, although the performance took a big hit.

Before, I got around the need to establish FastCGI connections a lot by looping over a Vec<OnceCell<Arc<Mutex<FCGIClient>>> in a round-robin fashion using an AtomicU16. After the connections were established, on an initial request to that particular cell, the subsequent connections didn't have the overhead associated with establishing a connection.

For streaming, long-lived responses, however, a glaring issue is that the round-robin ordering might land a request waiting for some huge media file. So, this round-robin song and dance doesn't work. I could observe the status of a Mutex and try the next connection if it's locked, but there's always the risk of looping too much and eating CPU cycles, not to mention it's hard to know which connection to try to acquire while looping, and this behavior would also be blocking.

I was able to solve this by basically just using a queue and a semaphore to count the available connections and pop a connection once one got returned to the pool, but using streaming and this extra synchronization massively tanked the performance to the point that I was doing worse than Nginx (by about 10%). I think that if I were to actually need this for something performance-critical, then I'd probably want to write my own FastCGI protocol implementation, as well, since I found that some areas of fastcgi_client didn't suit my needs/taste.

As a side-note, I kinda wish there was a better way to initialize the FCGI connections than in an on-demand fashion. However that would require calling some initialization method that might be added to the ServeHttp or HttpServerApp traits, since those don't have a start_service method like the Service trait does. Theoretically, I could blockingly create std TCP/Unix connections when I create the service and wrap them in the tokio equivalents, but I'd have to wait for a runtime to be available, I think.

cloudflare / pingora