dunglas / frankenphp

🧟 The modern PHP app server
https://frankenphp.dev
MIT License
6.96k stars 244 forks source link

Fix: Load the os env once at startup. #1141

Closed Alliballibaba2 closed 2 weeks ago

Alliballibaba2 commented 2 weeks ago

While doing some extensive testing on #1137, I noticed that the TestEnv test will sporadically still segfault. This can be reproduced by doing the following multiple times:

go test -run TestEnv -count 100

This might be related to some Laravel segfaults people have reporting (that are hard to track down definitively)

This led me to believe that reading the env from go at runtime also isn't fully safe since extensions (and cli scripts?) might potentially access it, which circumvents the go locks. The changes in this branch seem to fix the segfault by just loading the env once at startup and writing changes from putenv and getenv into this 'cached environment'.

I don't know if this is the best solution, but I think FPM wouldn't even persist changes from putenv across requests.

dunglas commented 2 weeks ago

It looks like it's not possible to fix this issue :/ https://www.evanjones.ca/setenv-is-not-thread-safe.html

dunglas commented 2 weeks ago

Related issue: https://github.com/golang/go/issues/63567

dunglas commented 2 weeks ago

Thanks to @withinboredom patch, I think that extensions could be patched to call the PHP setenv() and getenv() functions (instead of the C ones). When using FrankenPHP, this will likely fix the issue because we delegate to thread-safe versions provided by Go.

Alliballibaba2 commented 2 weeks ago

What are the use cases of letting PHP access the env directly? In the documentation of putenv it also states that the variable will only persist for the duration of the current request. I guess with FPM this isn't as big an issue since every process has its own environment, but with threads this becomes (as you mentioned) very unsafe. Not sure what the right answer is.

dunglas commented 2 weeks ago

@Alliballibaba2 I guess that it's to allow setting environment variables for subprocesses started with exec() and the like.

However, we may have introduced a bug/inconsistency in #1086, because I don't think we restore the previous environment at the end of the request.

Alliballibaba2 commented 2 weeks ago

I think the environment was never restored at the end of a request, even before #1086

dunglas commented 2 weeks ago

So there is likely an issue in docs, or other SAPI have specific env var handling because we didn't have any specific code for that.

dunglas commented 2 weeks ago

I dug into the PHP code, and here are my findings:

  1. PHP indeed restores the previous environment at the end of the request (and I didn't test it, but I think it was working as expected before #1086): https://github.com/php/php-src/blob/master/ext/standard/basic_functions.c#L420C1-L420C68 We need to fix this.
  2. We should replace our SAPI getenv() to delegate to Go, https://github.com/dunglas/frankenphp/blob/main/frankenphp.c#L768, this will fix extensions using sapi.getenv() (I'm on it)

We should try to catch all calls to putenv() and setenv() in extensions and delegate to Go.

Alliballibaba2 commented 2 weeks ago

I think the fundamental problem here is that multiple threads can handle requests simultaneously, but they share the same environment. That might lead to potential race conditions (unsetting an env in one request while another request is still running).

I think we need to store the stuff coming from putenv on a per-thread basis

dunglas commented 2 weeks ago

My understanding is that Go's os.Setenv() and os.Getenv() are thread-safe (they have their own locking mechanism) as long as no direct call to C's putenv() or setenv() is made.

The best patch would be to ensure that Go's os.Setenv() is always used (even by extensions). To do so we could either:

  1. Provide a hook in PHP itself, that extensions will use to set an environment variable, and that will delegate to os.Setenv() when using FrankenPHP (the cleanest solution IMHO)
  2. Ensure that extensions use PHP's putenv() function, that we override
dunglas commented 2 weeks ago

@withinboredom I wonder if we shouldn't partially revert #1086 before tagging a new release, as we introduce a behavior change that is inconsistent with other SAPIs and previous behaviors, and that doesn't fix all segfaults (yet)?

withinboredom commented 2 weeks ago

Just catching up. To be honest, #1086 isn't the end of the road (getting the env overridden was the main goal there).

This patch looks good to me, but it just covers the underlying problem. If there are segfaults due to environment access after my patch, it means that even if this patch fixes the race conditions, there is "something" that will get an inconsistent read on the environment.

What we need to figure out is if it is an extension/core function calling libc's getenv/setenv or if it is calling php's. The former means it is inconsistent anyway, and the latter needs changes to TSRM.

In any case, I think this patch is a step in the right direction.

AlliBalliBaba commented 2 weeks ago

Yeah this PR just blocks PHP from modifying the actual environment at runtime. I can adjust it so the 'reset after request' behavior is also there (on a per-thread basis). I think this is the safer solution. Multiple threads reading and writing to the environment concurrently probably isn't really the expected behavior even if it was made safe.

If we don't want to go this route, I can also close this PR and we can revert #1086 in the meantime

dunglas commented 2 weeks ago

As the segfault you're tracking isn't related to env vars anyway, I suggest closing this one.

I would keep #1086 as it is safer than the current implementations, but it would be nice to fix its behavior to be consistent with what PHP does (restoring the environment after the request).

AlliBalliBaba commented 2 weeks ago

The segfault from go test -run TestEnv -count 100 is still real, I just don't know if people are experiencing it.

The segfault from Laravel startup is probably related to opcache. Worker mode doesn't really need opcache, it would still be nice to resolve this before another release though.

AlliBalliBaba commented 2 weeks ago

Thinking about it, is it possible that cli php and frankenphp share memory through opcache?

dunglas commented 1 week ago

Interesting article by the Steam team about this same issue: https://ttimo.typepad.com/blog/2024/11/the-steam-client-update-earlier-this-week-mentions-fixed-some-miscellaneous-common-crashes-in-the-linux-notes-which-i-wante.html

This may be fixed at some point in glibc: https://inbox.sourceware.org/libc-alpha/cover.1722193092.git.fweimer@redhat.com/

Comments on HN: https://news.ycombinator.com/item?id=42110677