Closed KevinWang15 closed 2 years ago
This use case sounds like the the kubernetes Api server should add some basic rate limiting.
This use case sounds like the the kubernetes Api server should add some basic rate limiting.
Right, there are some rate limiting in place, and I also worked on tuning the algorithms there.
But if we could have this API, it would be invaluable, and bring the whole thing to the next level.
I would rather say you need a better json parser that is interruptable through a context.
I would rather say you need a better json parser that is interruptable through a context.
Well, a better JSON parser certainly will help. Apart from json.Marshal, there are some other CPU-intensive function calls that are not context-aware. It would be much work if we replace everything with a context-aware alternative.
And is it good to make everything context-aware? Is it going to add a lot of overhead? Quoting this comment: https://github.com/golang/go/issues/50422#issuecomment-1004768507
mvdan commented: Usually, a context or timeout/deadline is required for asynchronous work where it's practically impossible to know how long an operation will take. I also imagine it would be useful for libraries that interpret or execute code, such as interpreting a script that might loop forever. My first impression is that encoding/json falls under neither of those categories.
When all these problems can be solved with one single added API in golang runtime, and it isn't that hard to implement, I am tempted not to do things the other way around.
I guess it all boils down to, is it so abominably against the philosophies and principles of golang design? Or is it negotiable?
Please fill out https://github.com/golang/proposal/blob/master/go2-language-changes.md when proposing language changes
ACK, I will fill it out, thanks
I consider myself quite experienced.
Java, JavaScript
I think it should make no difference. It is only an extra API that a developer can choose to use. They can learn it when they want to use it.
The closest is question https://github.com/golang/go/issues/32610. We are asking for the same feature. The previous discussion was abandoned and didn't offer a use case. So I decided to bring it up again.
Everyone who uses Golang to write a server, and potentially some other use cases. It will enable developers to make sure that when the server has finished sending a response for a request, every goroutine that was used to serve the request can be aborted immediately. No lingering goroutines will get stuck in expensive function calls that are not context-aware, and waste CPU / memory.
runtime.GetGoroutineHandle()
that will return a *GoroutineHandle
of the caller goroutine..Abort()
method on *GoroutineHandle
that will make the related goroutine panic with "aborted"
at the earliest convenience (when it is safe to do so). All defer
s will be executed in order not to break invariants. Nothing would change in the language spec.
"Now it's possible to kill a goroutine just like you can kill a thread. Of course, it is dangerous and not encouraged; Context and other cooperative methods are still your go to solution, but it is a possibility if you know what you are doing"
Yes.
func main() {
goroutineHandle := make(chan *GoroutineHandle)
go func() {
goroutineHandle <- runtime.GetGoroutineHandle()
json.Marshal(aVeryBigObject)
}()
timer := time.NewTimer(10 * time.Second)
<-timer.C
(<-goroutineHandle).Abort()
}
It will not affect tools or add compile time / run time cost.
Put a goid
into the ToBeAborted
set, wait for it to get descheduled naturally (either by Gosched or preemptive scheduling). Then, when it is about to resume, change its PC to a function that will panic immediately; Or, add such a check in Gosched.
The language spec wouldn't change.
No, it's just another API. Just like Thread.stop
provided by Java.
Yes. We can expect server programs to waste less CPU and memory. See my analysis and experiments in https://github.com/golang/go/issues/50678
No
PTAL @seankhliao
(emphasis mine)
make the related goroutine panic with
"aborted"
at the earliest convenience (when it is safe to do so)
That's the problem: it is never safe to make a goroutine panic unexpectedly.
Go programs can (and should!) be written and tested such that unexpected panics cannot and do not occur — especially given Go's new fuzzing support set to be released in Go 1.18. In particular, panics should not cross package boundaries except in the case of programmer error: if package A calls into package B, package B should only panic if A has violated some documented or obvious invariant of its API.
If a particular package knows that it is safe to panic at some particular point, that package could just as easily check a Context
for cancellation at that point too.
Right, it is never safe to make a goroutine panic unexpectedly, as unsafe as killing a thread in Java.
Maybe it should be put into the unsafe
package, and people should never use it unless they know what they are doing.
Shall we give developers the power to do it if they really want to? (Java did). If the program crashes, then the developer is to blame because he played with fire and got burned.
By when it is safe to do so
I meant, from go runtime's point of view. It's indeed impossible to guarantee there will be no logic error or deadlocks when a goroutine is aborted abruptly. I guess that responsibility falls on the developer.
Shall we give developers the power to do it if they really want to? (Java did).
Go is not Java, nor should it be: Java is already a pretty good Java, so Go making different decisions adds diversity to the language ecosystem.
If the program crashes, then the developer is to blame because he played with fire and got burned.
Go consistently chooses not to take that approach. (That's why we have automatic bounds checks and a fairly permissive memory model, and also why the language discourages pointer arithmetic and subtle memory-management patterns.)
Besides, the failure mode of an unexpected panic
isn't always a crash. Sometimes it's much more difficult to diagnose, like a deadlock in a production server.
Okay, I see your point, I can understand. Thanks
If you use a Context-aware io.Reader like here: https://pace.dev/blog/2020/02/03/context-aware-ioreader-for-golang-by-mat-ryer.html, together with a Json parser that can unmarshal from an io.Reader, and there are several like that, then your problem should be solved.
If you use a Context-aware io.Reader like here: https://pace.dev/blog/2020/02/03/context-aware-ioreader-for-golang-by-mat-ryer.html, together with a Json parser that can unmarshal from an io.Reader, and there are several like that, then your problem should be solved.
Thanks for the suggestion. The current encoding/json
has func NewEncoder(w io.Writer)
which can take in a io.Writer
, but its behavior isn't ideal for this use case ( https://github.com/golang/go/issues/33714 , basically it completes the whole marshaling in a buffer and writes to io.Writer
in one shot ).
Maybe there open-source alternatives to encoding/json
that behaves differently (it needs to work with json.Marshaler
interface or we will have to rewrite a lot of code), I will take a look.
Our solution for now is to hack src/runtime
and src/encoding/json
and make the goroutine panic when the request was cancelled. This solution has the upside of being very easy to implement (100 lines of code), but the downside is we will have to use a forked (hacked) Golang. I guess when https://github.com/golang/go/issues/33714 is solved by upstream we can try that way too.
I agree with @beoran that a context-aware reader with a streaming decoder (or a context-aware writer with a streaming encoder for marshals) seems like a good approach. encoding/json doesn't currently support streaming encodes (https://github.com/golang/go/issues/33714), but that's certainly something we want to fix in the future.
If you use a Context-aware io.Reader like here: https://pace.dev/blog/2020/02/03/context-aware-ioreader-for-golang-by-mat-ryer.html, together with a Json parser that can unmarshal from an io.Reader, and there are several like that, then your problem should be solved.
Interesting. So this basically turns the early exits from error handling into preemption points. At least, most of the error handling, since the errors returned from a Reader are only a subset of all the errors returned by a JSON parser.
Based on the discussion above, this is a likely decline. Leaving open for four weeks for final comments.
Thank you, I'm fine with closing this
No change in consensus.
Hi,
Can we have the feature to kill one goroutine from another goroutine?
I have read https://github.com/golang/go/issues/32610 and several other disucssions. I believe the answer was NO, but I still want to present my use cases and experiments for your consideration.
I was working on Kubernetes APIServer. When APIServer was OOMKilled and restarted, all clients are going to reconnect to APIServer and try to download all the data they need from APIServer to their local cache. They do this by sending a LIST request.
After an APIServer restarts, there would be tons of concurrent LIST requests, so CPU would be very overloaded. The HTTP response was quite big, with every CPU core saturated,
json.Marshal
of the response would take on average 300 seconds to complete (whereas they would take only 10 seconds to complete had the CPU not been so busy).The interesting thing is, APIServer will return a
GatewayTimeout
response after 60 seconds, and the client will retry the request on receiving this error.So at the 60 second mark, APIServer and the client have both given up on the request, the request context was cancelled, but the goroutine was still actively doing
json.Marshal
of the response for the request! (becausejson.Marshal
wasn't supposed to be context-aware.)That's the last thing we want when the server was already overloaded. In the next 240 seconds that goroutine was wasting CPU for nothing, and it was not releasing the memory. Soon the clients would retry the LIST request and another league of goroutines would be created to serve the requests - the overload just got worse!
In my observation, due to the lingering
json.Marshal
goroutine, each client will result in the creation of 5 goroutines, and use 5x the CPU and memory.I actually did a little experiment and hacked
src/runtime
.goid
, so I stored thegoid
of the serving goroutine in the request contextpanic
) bygoid
is added, so whenGatewayTimeout
occurs, I could read the goroutine that was doingjson.Marshal
from request context and make it panicThe results are exactly what we want! 60% memory usage reduction (and much less prone to OOMKills) and faster recovery.
(P.S. we changed some other behaviors of APIServer, e.g. returning
Retry-After
header to disperse client-side retries and imposing a smaller MaxRequestsInflight when a LIST flood is detected. And the above experiment already had those optimizations turned on)This issue is not limited to Kubernetes APIServer. I guess every server powered by golang would have more or less the same problem, and would all benefit from an API for killing goroutines (non-cooperatively, because using context / channels to cooperatively make goroutines exit is sometimes not viable).
If this use case could be considered valid, I would like to humbly request to add such an API. It doesn't have to expose the
goid
to the user, just allowing the user to get a handle of the current goroutine, store it somewhere, and then provide anAbort()
method that will make the goroutine panic at the earliest convenience, would be good enough.Thanks!