Closed geoidesic closed 1 month ago
Hi, this is likely caused by an extension that doesn't block the SIGPIPE signal. We fixed the amqp extension (https://github.com/php-amqp/php-amqp/pull/550, https://github.com/dunglas/frankenphp/issues/682) as well as FrankenPHP itself (https://github.com/dunglas/frankenphp/pull/651) in the past.
Could you try to gather a stack trace, wo we can identify (and hopefully fix) the offender?
The easiest way is to use a debug binary with GDB: https://frankenphp.dev/docs/contributing/#debugging-segmentation-faults-with-static-builds
But in this case using strace
to trace the I/O may be enough to identify the broken connection.
Thanks.
Um... I don't really understand any of that?
strace
, nor when to do it?Ok a bit of research later and I've set up strace (I think):
nohup strace -e trace=signal,read,write -p 111840 -o strace_output.txt > /dev/null 2>&1 &
So now I'll wait for a day or so and see if we catch any fish...
@geoidesic, any fish?
Nope. The strace output file is empty but the frankenphp pid has changed. So I guess don't know how to capture that. I can see that it's only been up and running for 2 days, which means it's still falling over periodically:
[root@dev ~]# systemctl status frankenphp
— frankenphp.service - FrankenPHP Server
Loaded: loaded (/etc/systemd/system/frankenphp.service; enabled; preset: disabled)
Active: active (running) since Tue 2024-09-17 10:22:27 UTC; 2 days ago
Anything in journalctl
related to frankenphp (I think it is the -x
flag, but can't remember off the top of my head)?
A lot of the TLS handshake errors are originating from the server itself e.g.:
Sep 26 23:00:08 app.sportch.co.uk frankenphp[44270]: {"level":"debug","ts":1727391608.0926747,"logger":"http.stdlib","msg":"http: TLS handshake error from 82.180.154.41:42652: local error: tls: bad record MAC"}
This is the IP address of the server itself: 82.180.154.41
Here's some more logs.
Note this is a standard request made to the app. After the request completes successfully there seems to be a number of requests from 127.0.0.1 to 127.0.0.1 that fail the TLS lookup. These are not generated by our application code. Something – presumably frankenphp – is causing those spurious requests. I don't understand it. This is happening frequently. Very frequently.
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.4948027,"logger":"http.handlers.rewrite","msg":"rewrote request","request":{"remote_ip":"193.42.53.172","remote_port":"62437","client_ip":"193.42.53.172","proto":"HTTP/2.0","method":"POST","host":"app.sportch.co.uk","uri":"/api/v1/adverts","headers":{"Content-Length":["83"],"X-App-Revision":["1.9.0.r1.web"],"Sec-Ch-Ua-Mobile":["?0"],"Sec-Fetch-Site":["same-origin"],"Accept-Language":["en-GB,en-US;q=0.9,en;q=0.8"],"Cookie":["REDACTED"],"X-Api-Token":["REDACTED"],"Crossorigin":["anonymous"],"X-App-Version":["10900"],"Accept":["application/json, text/javascript, */*; q=0.01"],"X-Api-Public":["REDACTED"],"Origin":["https://app.sportch.co.uk"],"Sec-Fetch-Dest":["empty"],"Sec-Ch-Ua-Platform":["\"Windows\""],"Sec-Fetch-Mode":["cors"],"Accept-Encoding":["gzip, deflate, br, zstd"],"Priority":["u=1, i"],"X-Api-Private":["REDACTED"],"Sec-Ch-Ua":["\"Google Chrome\";v=\"129\", \"Not=A?Brand\";v=\"8\", \"Chromium\";v=\"129\""],"X-Timezone":["3600"],"User-Agent":["Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36"],"Content-Type":["application/json"],"Referer":["https://app.sportch.co.uk/web/"]},"tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"app.sportch.co.uk"}},"method":"POST","uri":"/index.php"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.4953215,"logger":"http.handlers.file_server","msg":"sanitized path join","site_root":"/var/www/html/SportchLegacyApp","fs":"","request_path":"/web/images/match-single.png","result":"/var/www/html/SportchLegacyApp/web/images/match-single.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.495429,"logger":"http.handlers.file_server","msg":"opening file","filename":"/var/www/html/SportchLegacyApp/web/images/match-single.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.4958642,"logger":"http.handlers.file_server","msg":"sanitized path join","site_root":"/var/www/html/SportchLegacyApp","fs":"","request_path":"/web/images/icon-clap-white.png","result":"/var/www/html/SportchLegacyApp/web/images/icon-clap-white.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.4958966,"logger":"http.handlers.file_server","msg":"opening file","filename":"/var/www/html/SportchLegacyApp/web/images/icon-clap-white.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.4960034,"logger":"http.handlers.file_server","msg":"sanitized path join","site_root":"/var/www/html/SportchLegacyApp","fs":"","request_path":"/web/images/icon-coach-white.png","result":"/var/www/html/SportchLegacyApp/web/images/icon-coach-white.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.496093,"logger":"http.handlers.file_server","msg":"opening file","filename":"/var/www/html/SportchLegacyApp/web/images/icon-coach-white.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.4963937,"logger":"http.handlers.file_server","msg":"sanitized path join","site_root":"/var/www/html/SportchLegacyApp","fs":"","request_path":"/web/images/level-1.png","result":"/var/www/html/SportchLegacyApp/web/images/level-1.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.4970129,"logger":"http.handlers.file_server","msg":"opening file","filename":"/var/www/html/SportchLegacyApp/web/images/level-1.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.4964833,"logger":"http.handlers.file_server","msg":"sanitized path join","site_root":"/var/www/html/SportchLegacyApp","fs":"","request_path":"/web/images/level-4.png","result":"/var/www/html/SportchLegacyApp/web/images/level-4.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.4973123,"logger":"http.handlers.file_server","msg":"opening file","filename":"/var/www/html/SportchLegacyApp/web/images/level-4.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.5686858,"logger":"http.handlers.file_server","msg":"sanitized path join","site_root":"/var/www/html/SportchLegacyApp","fs":"","request_path":"/web/images/no-image.png","result":"/var/www/html/SportchLegacyApp/web/images/no-image.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.5687237,"logger":"http.handlers.file_server","msg":"opening file","filename":"/var/www/html/SportchLegacyApp/web/images/no-image.png"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.611154,"logger":"events","msg":"event","name":"tls_get_certificate","id":"ef5ed62f-545c-4d56-af02-c1fa8631cb56","origin":"tls","data":{"client_hello":{"CipherSuites":[4866,4867,4865,49199,49195,49200,49196,158,162,163,159,49191,49187,49171,49161,49192,49188,49172,49162,103,51,64,107,56,57,156,157,49324,49310,50,49308,60,47,49325,49311,106,49309,61,53,52393,52392,52394,49245,49249,49239,49235,49244,49248,49238,49234,49267,49271,196,195,49266,49270,190,189,136,135,69,68,49233,49232,192,186,132,65,255],"ServerName":"app.sportch.co.uk","SupportedCurves":[29,23,30,25,24,256,257,258,259,260],"SupportedPoints":"AAEC","SignatureSchemes":[1027,1283,1539,2055,2056,2074,2075,2076,2057,2058,2059,2052,2053,2054,1025,1281,1537,771,769,770,1026,1282,1538],"SupportedProtos":null,"SupportedVersions":[772,771],"RemoteAddr":{"IP":"127.0.0.1","Port":40968,"Zone":""},"LocalAddr":{"IP":"127.0.0.1","Port":443,"Zone":""}}}}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.6111825,"logger":"tls.handshake","msg":"choosing certificate","identifier":"app.sportch.co.uk","num_choices":1}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.6111953,"logger":"tls.handshake","msg":"default certificate selection results","identifier":"app.sportch.co.uk","subjects":["app.sportch.co.uk"],"managed":true,"issuer_key":"acme-v02.api.letsencrypt.org-directory","hash":"REDACTED"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.6112022,"logger":"tls.handshake","msg":"matched certificate in cache","remote_ip":"127.0.0.1","remote_port":"40968","subjects":["app.sportch.co.uk"],"managed":true,"expiration":1734939626,"hash":"REDACTED"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.6118376,"logger":"http.stdlib","msg":"http: TLS handshake error from 127.0.0.1:40968: local error: tls: bad record MAC"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.6133301,"logger":"events","msg":"event","name":"tls_get_certificate","id":"7f5e7696-5aa1-4f82-a4ee-353cd0ab452d","origin":"tls","data":{"client_hello":{"CipherSuites":[4866,4867,4865,49199,49195,49200,49196,158,162,163,159,49191,49187,49171,49161,49192,49188,49172,49162,103,51,64,107,56,57,156,157,49324,49310,50,49308,60,47,49325,49311,106,49309,61,53,52393,52392,52394,49245,49249,49239,49235,49244,49248,49238,49234,49267,49271,196,195,49266,49270,190,189,136,135,69,68,49233,49232,192,186,132,65,255],"ServerName":"app.sportch.co.uk","SupportedCurves":[29,23,30,25,24,256,257,258,259,260],"SupportedPoints":"AAEC","SignatureSchemes":[1027,1283,1539,2055,2056,2074,2075,2076,2057,2058,2059,2052,2053,2054,1025,1281,1537,771,769,770,1026,1282,1538],"SupportedProtos":null,"SupportedVersions":[772,771],"RemoteAddr":{"IP":"127.0.0.1","Port":40984,"Zone":""},"LocalAddr":{"IP":"127.0.0.1","Port":443,"Zone":""}}}}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.6133478,"logger":"tls.handshake","msg":"choosing certificate","identifier":"app.sportch.co.uk","num_choices":1}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.6133573,"logger":"tls.handshake","msg":"default certificate selection results","identifier":"app.sportch.co.uk","subjects":["app.sportch.co.uk"],"managed":true,"issuer_key":"acme-v02.api.letsencrypt.org-directory","hash":"REDACTED"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.613364,"logger":"tls.handshake","msg":"matched certificate in cache","remote_ip":"127.0.0.1","remote_port":"40984","subjects":["app.sportch.co.uk"],"managed":true,"expiration":1734939626,"hash":"REDACTED"}
Sep 29 13:24:46 app frankenphp[2917]: {"level":"debug","ts":1727616286.6138408,"logger":"http.stdlib","msg":"http: TLS handshake error from 127.0.0.1:40984: local error: tls: bad record MAC"}
FrankenPhp doesn't make any requests by itself. To be honest, if your application is doing nonsensical things, you might want to check if you have been hacked.
I remember a client once where I saw stuff like this and discovered a whole cialis spam website in a hidden folder.
But to reiterate, FrankenPhp doesn't make any requests. It forwards request from caddy to php. So, if you see a request in your logs, that request was made by a client.
Yeah. Fair comment really. The legacy code is a bit of a custom job which is hard to evaluate fully because of how it's been written. Definitely not hacked but quite possibly some rogue legacy code doing weird things. Also maybe a red-herring because it's not related to the SIGPIPE fault, it's just been getting in the way as I'm studying the logs.
I've been through the logs again today. I can't see any SIGPIPE failures. My logs go back to 8 September; nothing. So maybe that's no longer an issue. Frankenphp is still stopping and restarting often but it seems to be a deactivation, e.g.:
Oct 3 14:59:48 app systemd[1]: frankenphp.service: Deactivated successfully.
It's not clear why this might be happening. Any suggestions on how to monitor this?
In the next release, there will be some metrics: https://github.com/dunglas/frankenphp/blob/main/docs/metrics.md
That might be helpful.
I think we are observing the very same issue with the static build 1.2.5. Regardless if run by systemd or not, frankenphp is killed after a few minutes of runtime by a SIGPIPE. Absolut no journal entry when frankenphp gets cut off:
Oct 15 16:39:17 v2 systemd[1]: Started FrankenPHP Server.
Oct 15 16:39:17 v2 frankenphp-linux-x86_64[122325]: {"level":"info","ts":1729003157.986765,"logger":"tls","msg":"storage cleaning happened too recently; skipping for now","storage":"FileStorage:/var/lib/caddy","instance":"XX-XX-XX-XX-XXX","try_again":1729089557.9867628,"try_again_in":86399.999999616}
Oct 15 16:39:17 v2 frankenphp-linux-x86_64[122325]: {"level":"info","ts":1729003157.986842,"logger":"tls","msg":"finished cleaning storage units"}
Oct 15 16:41:18 v2 systemd[1]: frankenphp.service: Deactivated successfully.
Oct 15 16:41:18 v2 systemd[1]: frankenphp.service: Consumed 3.342s CPU time.
Oct 15 16:41:18 v2 systemd[1]: frankenphp.service: Scheduled restart job, restart counter is at 131.
Oct 15 16:41:18 v2 systemd[1]: Stopped FrankenPHP Server.
Oct 15 16:41:18 v2 systemd[1]: frankenphp.service: Consumed 3.342s CPU time.
Oct 15 16:41:18 v2 systemd[1]: Starting FrankenPHP Server...
Oct 15 16:41:19 v2 frankenphp-linux-x86_64[122418]: {"level":"info","ts":1729003279.4957032,"msg":"using config from file","file":"/etc/caddy/Caddyfile"}
Attached strace output. sigpipe1.txt
@Eckieck could you try to replace your static build with a debug version (the binary with the -debug
suffix in the latest release) and follow these steps to get a back trace, please? This would be very helpful: https://frankenphp.dev/docs/contributing/#debugging-segmentation-faults-with-static-builds
Could you run the bt
command in the GDB shell after the crash. There is something weird happening.
Thank you very much. It looks like this happen in the Go runtime itself, not in FrankenPHP or PHP. This is very weird.
This seems to occur when Go is doing some crypto things to handle a TLS connection.
Is there anything special on your system (processor for instance) that may be uncommon?
According to the manual, this should not happen: https://pkg.go.dev/os/signal#hdr-SIGPIPE
Maybe similar:
There is something very weird in the logs. The thread is named frankenphp-linu
. New versions should use php-<int>
. Could you copy the output of frankenphp build-info
please?
Is there anything special on your system (processor for instance) that may be uncommon?
The system is Rocky Linux 9.4 in a QEMU-VM on Proxmox 8.2. CPU is Xeon Silver 4416+. Crash happens after between a few seconds and three minutes. Barly usable.
Ok, systemd seems to do weird things: https://github.com/moby/moby/blob/1b7c209de1f158937480f675e85d075fda1c9743/cmd/dockerd/docker.go#L87-L90
Let's try to totally ignore SIGPIPE Go-side (we already ignore it C-side) to see if this fixes the issue: https://github.com/dunglas/frankenphp/pull/1101
Could you try this patch @Eckieck? The CI will build a test binary.
Ok, systemd seems to do weird things: https://github.com/moby/moby/blob/1b7c209de1f158937480f675e85d075fda1c9743/cmd/dockerd/docker.go#L87-L90
But: while debugging what is happening i started frankenphp without systemd and it got killed, too. Should i re-try this with the old build?
Let's try to totally ignore SIGPIPE Go-side (we already ignore it C-side) to see if this fixes the issue: #1101
Could you try this patch @Eckieck? The CI will build a test binary.
Using the generated debug-build I would say it is now fixed, new runtime record:
Active: active (running) since Thu 2024-10-17 09:29:11 CEST; 40min ago
Thank you!
Let's try to totally ignore SIGPIPE Go-side (we already ignore it C-side) to see if this fixes the issue: #1101 Could you try this patch @Eckieck? The CI will build a test binary.
@dunglas How can I get hold of that binary for my environment? That way I can also test it. (I don't know what you mean by The CI will build a test binary
)
Tx. Noel
The fix-build is still working for me. It is segfaulting sometimes after a few hours, but i have to investigate this further. The sigpipe-problem is gone.
@geoidesic Have look here: https://github.com/dunglas/frankenphp/actions/runs/11374351086
@geoidesic Have look here: https://github.com/dunglas/frankenphp/actions/runs/11374351086
@Eckieck @dunglas That looks like a docker build. I'm using static binaries. Maybe I just don't understand the UI? I can't find where to download the binary. [Update: nvm. they are linked at the bottom of that page you linked.]
FYI @geoidesic (only seeing this now)
A lot of the TLS handshake errors are originating from the server itself e.g.:
Sep 26 23:00:08 app.sportch.co.uk frankenphp[44270]: {"level":"debug","ts":1727391608.0926747,"logger":"http.stdlib","msg":"http: TLS handshake error from 82.180.154.41:42652: local error: tls: bad record MAC"}
This is the IP address of the server itself: 82.180.154.41
That looks like normal bot traffic. To explain, Caddy needs to choose a certificate to perform the TLS handshake. Many bots don't bother to send TLS SNI to declare a domain to try to just get a default certificate from the server when it scans the open internet. Caddy will look for a certificate matching the RemoteAddr of the connection when TLS SNI is empty, but doesn't find one (because you didn't configure it to generate one) so the handshake fails. Totally normal.
What happened?
The problem we’re experiencing now with this new setup is that the frankenphp service keeps shutting down without any reason that I can see. The systemd unit file is configured to restart on failure, but I guess this shutdown is not seen as a failure because it’s not restarting.
Build Type
Official static build
Worker Mode
No
Operating System
GNU/Linux
CPU Architecture
x86_64
PHP configuration
Relevant log output
From
journalctl -xeu frankenphp.service