Closed Alliballibaba2 closed 2 weeks ago
It looks like it's not possible to fix this issue :/ https://www.evanjones.ca/setenv-is-not-thread-safe.html
Related issue: https://github.com/golang/go/issues/63567
Thanks to @withinboredom patch, I think that extensions could be patched to call the PHP setenv()
and getenv()
functions (instead of the C ones). When using FrankenPHP, this will likely fix the issue because we delegate to thread-safe versions provided by Go.
What are the use cases of letting PHP access the env directly? In the documentation of putenv it also states that the variable will only persist for the duration of the current request. I guess with FPM this isn't as big an issue since every process has its own environment, but with threads this becomes (as you mentioned) very unsafe. Not sure what the right answer is.
@Alliballibaba2 I guess that it's to allow setting environment variables for subprocesses started with exec()
and the like.
However, we may have introduced a bug/inconsistency in #1086, because I don't think we restore the previous environment at the end of the request.
I think the environment was never restored at the end of a request, even before #1086
So there is likely an issue in docs, or other SAPI have specific env var handling because we didn't have any specific code for that.
I dug into the PHP code, and here are my findings:
getenv()
to delegate to Go, https://github.com/dunglas/frankenphp/blob/main/frankenphp.c#L768, this will fix extensions using sapi.getenv()
(I'm on it)We should try to catch all calls to putenv()
and setenv()
in extensions and delegate to Go.
I think the fundamental problem here is that multiple threads can handle requests simultaneously, but they share the same environment. That might lead to potential race conditions (unsetting an env in one request while another request is still running).
I think we need to store the stuff coming from putenv on a per-thread basis
My understanding is that Go's os.Setenv()
and os.Getenv()
are thread-safe (they have their own locking mechanism) as long as no direct call to C's putenv()
or setenv()
is made.
The best patch would be to ensure that Go's os.Setenv()
is always used (even by extensions). To do so we could either:
os.Setenv()
when using FrankenPHP (the cleanest solution IMHO)putenv()
function, that we override@withinboredom I wonder if we shouldn't partially revert #1086 before tagging a new release, as we introduce a behavior change that is inconsistent with other SAPIs and previous behaviors, and that doesn't fix all segfaults (yet)?
Just catching up. To be honest, #1086 isn't the end of the road (getting the env overridden was the main goal there).
This patch looks good to me, but it just covers the underlying problem. If there are segfaults due to environment access after my patch, it means that even if this patch fixes the race conditions, there is "something" that will get an inconsistent read on the environment.
What we need to figure out is if it is an extension/core function calling libc's getenv/setenv or if it is calling php's. The former means it is inconsistent anyway, and the latter needs changes to TSRM.
In any case, I think this patch is a step in the right direction.
Yeah this PR just blocks PHP from modifying the actual environment at runtime. I can adjust it so the 'reset after request' behavior is also there (on a per-thread basis). I think this is the safer solution. Multiple threads reading and writing to the environment concurrently probably isn't really the expected behavior even if it was made safe.
If we don't want to go this route, I can also close this PR and we can revert #1086 in the meantime
As the segfault you're tracking isn't related to env vars anyway, I suggest closing this one.
I would keep #1086 as it is safer than the current implementations, but it would be nice to fix its behavior to be consistent with what PHP does (restoring the environment after the request).
The segfault from go test -run TestEnv -count 100
is still real, I just don't know if people are experiencing it.
The segfault from Laravel startup is probably related to opcache. Worker mode doesn't really need opcache, it would still be nice to resolve this before another release though.
Thinking about it, is it possible that cli php and frankenphp share memory through opcache?
Interesting article by the Steam team about this same issue: https://ttimo.typepad.com/blog/2024/11/the-steam-client-update-earlier-this-week-mentions-fixed-some-miscellaneous-common-crashes-in-the-linux-notes-which-i-wante.html
This may be fixed at some point in glibc: https://inbox.sourceware.org/libc-alpha/cover.1722193092.git.fweimer@redhat.com/
Comments on HN: https://news.ycombinator.com/item?id=42110677
While doing some extensive testing on #1137, I noticed that the
TestEnv
test will sporadically still segfault. This can be reproduced by doing the following multiple times:go test -run TestEnv -count 100
This might be related to some Laravel segfaults people have reporting (that are hard to track down definitively)
This led me to believe that reading the env from go at runtime also isn't fully safe since extensions (and cli scripts?) might potentially access it, which circumvents the go locks. The changes in this branch seem to fix the segfault by just loading the env once at startup and writing changes from putenv and getenv into this 'cached environment'.
I don't know if this is the best solution, but I think FPM wouldn't even persist changes from putenv across requests.