Memory Leak while using BES

holtherndon-stripe commented 7 months ago

Description of the bug:

We extensively use bes, and we've noticed it uses a very large and increasing amount of memory which causes our builds to frequently crash. We are able to have our builds pass without memory issue when we disable our bes_backend, so we know the issue is related directly to bes.

In a recent crash I was able to capture a memory snapshot, and the BES service appears to be holding on to several million protobuf refs (image), this appears to eat into the direct buffers that are needed for general protobuf usage. (Sadly the direct buffers don't seem to show in heapdumps.) Typically the error is a kernel oom, however, I'm able to consistently reproduce the out of memory error by restricting the direct buffers max memory.

(Image from YourKit in a build that failed and produced the stacktrace below.)

Peeking at the code there are many append only sets that in our larger builds begin to cause pressure.

Stacktrace:

java.lang.OutOfMemoryError: Cannot reserve 2097152 bytes of direct buffer memory (allocated: 2146624451, limit: 2147483648)
    at java.base/java.nio.Bits.reserveMemory(Unknown Source)
    at java.base/java.nio.DirectByteBuffer.<init>(Unknown Source)
    at java.base/java.nio.ByteBuffer.allocateDirect(Unknown Source)
    at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:701)
    at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:676)
    at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:215)
    at io.netty.buffer.PoolArena.tcacheAllocateSmall(PoolArena.java:180)
    at io.netty.buffer.PoolArena.allocate(PoolArena.java:137)
    at io.netty.buffer.PoolArena.allocate(PoolArena.java:129)
    at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:396)
    at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
    at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:124)
    at io.grpc.netty.NettyWritableBufferAllocator.allocate(NettyWritableBufferAllocator.java:51)
    at io.grpc.internal.MessageFramer.writeKnownLengthUncompressed(MessageFramer.java:226)
    at io.grpc.internal.MessageFramer.writeUncompressed(MessageFramer.java:168)
    at io.grpc.internal.MessageFramer.writePayload(MessageFramer.java:141)
    at io.grpc.internal.AbstractStream.writeMessage(AbstractStream.java:65)
    at io.grpc.internal.DelayedStream.writeMessage(DelayedStream.java:278)
    at io.grpc.internal.ForwardingClientStream.writeMessage(ForwardingClientStream.java:37)
    at io.grpc.internal.RetriableStream.sendMessage(RetriableStream.java:545)
    at io.grpc.internal.ClientCallImpl.sendMessageInternal(ClientCallImpl.java:520)
    at io.grpc.internal.ClientCallImpl.sendMessage(ClientCallImpl.java:506)
    at io.grpc.ForwardingClientCall.sendMessage(ForwardingClientCall.java:37)
    at io.grpc.stub.ClientCalls$CallToStreamObserverAdapter.onNext(ClientCalls.java:374)
    at com.google.devtools.build.lib.buildeventservice.client.BuildEventServiceGrpcClient$BESGrpcStreamContext.sendOverStream(BuildEventServiceGrpcClient.java:166)
    at com.google.devtools.build.lib.buildeventservice.BuildEventServiceUploader.publishBuildEvents(BuildEventServiceUploader.java:453)
    at com.google.devtools.build.lib.buildeventservice.BuildEventServiceUploader.run(BuildEventServiceUploader.java:311)
    at java.base/java.lang.Thread.run(Unknown Source)

Which category does this issue belong to?

Core, Performance

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

The easiest way to reproduce is to restrict the direct buffer pool using --host_jvm_args=-XX:MaxDirectMemorySize=x. I was able to get an error on the bazel project itself using the following: bazelisk --host_jvm_args=-XX:MaxDirectMemorySize=128K build --build_event_json_file=/tmp/bes-out-new.json //.... This exacerbates the memory contraints. The error is different from above, but ~similar. Toying around with the -XX:MaxDirectMemorySize in a build that uses --bes_backend would yield a similar error.

Which operating system are you running Bazel on?

Ubuntu

What is the output of `bazel info release`?

release 7.0.1 (tho we've had this issue for many past versions as well)

If `bazel info release` returns `development version` or `(@non-git)`, tell us how you built Bazel.

No response

What's the output of `git remote get-url origin; git rev-parse HEAD` ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

coeuvre commented 7 months ago

It looks like BuildEventStreamer keeps track of all events that have been announced and posted during a build, so memory grows as the events are posted.

Assigning to core team for further investigation.

haxorz commented 7 months ago

Mike, assigning to you for triage.

@justinhorvitz is this something we've already diagnosed as part of reducing Bazel's memory usage? I don't recall seeing a heapdump like in the attached screenshot. Maybe because it's only a thing for peak not retained heap?

justinhorvitz commented 7 months ago

I am curious what makes up those millions of events being retained. On an example large build, I only see thousands of events in those sets. Are there some flags that greatly affect the number of events posted/announced? One I found was --show_progress_rate_limit, since progress messages end up in there. One build I tried showed about a 10x difference in the sizes of those sets between --show_progress_rate_limit=60 and --show_progress_rate_limit=0.001.

justinhorvitz commented 7 months ago

Now I noticed that your screenshot shows progressCount over 1M, so I might be on the right track.

michaeledgar commented 7 months ago

We definitely aim to not retain events for longer than necessary -- holding onto them would be a proper memory leak.

We do currently retain all of the BuildEventId objects until the next BES stream begins, and it looks like just that is consuming >800MB of RAM. Retaining the BuildEventIds is done intentionally to make it easy to track the inter-event dependencies that we try to honor.

In general, we could just improve on the existing implementations of Set<BuildEventId> announcedEvents and Set<BuildEventId> postedEvents. By carefully improving on the logic in BuildEventStreamer we could actually remove events from these sets when we know we are done with them.

However, the 1M progress events are an interesting wrinkle! These events have IDs that are just monotonically increasing integers and we could very easily find a more efficient way to represent these events in announcedEvents and postedEvents. That is, if they are contributing to a large heap.

(I don't think we should try to judge whether progress is being emitted too frequently -- it seems like a matter of taste, and progress events are also emitted to BEP whenever we emit a new NamedSetOfFiles, so IIUC the number of progress events depends on more than just --show_progress_rate_limit.)

I'm going to see exactly how expensive it is to retain a BuildEventId whose id is a ProgressId. It looks like there are a handful of fields needed for each individual BuildEventId. Assuming the BuildEventId is myId:

myId.idCase_: int
myId.id_: Object (in practice a ProgressId message object)
myId.memoizedIsInitialized: byte
myId.unknownFields: UnknownFieldSet
myId.memoizedSize: int
myId.memoizedHashCode: int

Without recursing into the fields needed for UnknownFieldSet and ProgressId, I am ballparking that each one clocks in at around 64 bytes -- at 1M events that is only 64MB, which would suggest that the progress events aren't the real issue here but the other ~3M events.

I'll run a debugger and pin down exactly how costly these events are, to see how much of the 800MB they could be contributing.

chiragramani commented 7 months ago

[7.0.0...7.1.0rc2] We've been experiencing frequent build failures too. Disabling BES has resulted in successful builds. I'll attempt to capture a heap dump for further analysis. Meanwhile, are there any specific flags we could use to reduce this related footprint?

keithl-stripe commented 7 months ago

Thanks for looking into this!

These events have IDs that are just monotonically increasing integers

For what it's worth, our application does not use these ID's, nor do we care about their monotonicness :) Do you imagine most BES users need that? If so, relaxing the constraint could make the implementation a lot simpler, right?

michaeledgar commented 7 months ago

Any heap dumps would be helpful!

The progress events are actually only 40 bytes on the heap. (The empty UnknownFieldSet is a singleton object and shared across all protobufs with no unknown fields)

I am unaware offhand of any changes in Bazel 7 that would have caused a great deal more BEP events to be emitted. In general the volume of BEP events is proportional to the amount of targets and artifacts being built.

If this issue is widespread I can look into optimizing the BuildEventStreamer to not hold onto event IDs after they are streamed. This wouldn't necessarily reduce the high-water-mark for memory usage, as the event IDs would be held until they are acknowledged by the BES server.

michaeledgar commented 7 months ago

A challenge with changing the current algorithm in BuildEventStreamer is that it accounts for events arriving out-of-order. In particular it allows events to declare that they must be delivered after other events by ID - not just the parent event, but other events. (This is used by eg. TargetSummary to ensure it is delivered after the TestSummary when testing is enabled).

One workaround that might be helpful would be to change how we implement the announcedEvents and postedEvents sets. Right now, they retain parsed protocol buffer objects for the event IDs. Instead we could have those sets match the same equivalence class but store different objects. That would relieve pressure on the native memory shared by all protobufs.

michaeledgar commented 7 months ago

Thanks for looking into this!

These events have IDs that are just monotonically increasing integers

For what it's worth, our application does not use these ID's, nor do we care about their monotonicness :) Do you imagine most BES users need that? If so, relaxing the constraint could make the implementation a lot simpler, right?

Forgot to address this - in general, we need to have IDs inside Bazel to keep track of which events have and have not been posted by the build tool. The business logic there depends on the type of the event. That said, this particular event type has simple business logic and might not actually need that tracking.

I'm going to first focus on the -XX:MaxDirectMemorySize exhaustion and get these retained IDs onto the regular Java heap. Then I'll see if I can reduce the actual retained size, which will take a bit of careful case analysis.

holtherndon-stripe commented 7 months ago

Hey @michaeledgar, thanks for helping look into this. A few things: 1) I do have a heapdump we can go over - I can't share that heapdump for many reasons, but I'd be more than happy to schedule some time and we can walk through it, just let me know! 2) I have 'back-of-mind' fear that the large memory portion (what I can't exactly account for) could be gRPC queuing messages waiting on the server to accept them. I'm no gRPC expert, but I do know netty can do things like such. Does that sound like a possible contributor to our larger ~10GB of memory usage with BES enabled?

joeljeske commented 4 months ago

I have captured many heap dumps where various BES allocations & queues consume ~40% of the available heap size.

For our setup, the BES server was another process, local to the bazel server. So we performed an experiment and removed --bes_backend passed --build_event_binary_file and reconfigured our BES process to tail that file and read delimited protos from that file. It greatly reduced our bazel OOM rate.

This leads me to believe that something specific to the BES is responsible for the excessive memory usage, since Bazel is still emitting the same build event data as before, but simply writing to a file instead of sending gRPC message.

bazelbuild / bazel