PaperMC / Waterfall

BungeeCord fork that aims to improve performance and stability.
https://papermc.io
MIT License
741 stars 302 forks source link

memory allocation attack? #627

Open Sneakometer opened 3 years ago

Sneakometer commented 3 years ago

Hello Waterfall community, i'm owning a smaller minecraft server with about 50 max concurrent players. I am recently facing bot attacks where multiple ips (proxies?) connect to the waterfall proxy, each allocating 16MB direct memory and thus rendering the server unusable in seconds.

I've allocated 512MB memory to waterfall, which was plently for the last 3 years. I still doubled it to 1 gig for now, but the DOS "attack" still manages to fill the ram in seconds.

This exception is spammed to console during attack:

[00:12:21] [Netty Worker IO Thread #9/ERROR]: [/36.94.40.114:53052] <-> InitialHandler - encountered exception
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 520093703, max: 536870912)
    at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:775) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:730) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:645) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:621) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:204) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:188) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.buffer.PoolArena.allocate(PoolArena.java:138) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.buffer.PoolArena.reallocate(PoolArena.java:288) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:307) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:282) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1105) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:99) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[Waterfall.jar:git:Waterfall-Bootstrap:1.16-R0.5-SNAPSHOT:c031df1:395]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_281]

You could argue that this is a ddos attack and can't be fixed by bungeecord/waterfall. However, the host machine was using about 30% cpu and 10% of it's network resources, it's really only bungeecord that is struggeling to keep up with that many requests.

I do use iptables to rate limit new connections per ip to the bc, but this does not really help as the connections come from too many different ips (proxy list?). I now added a global rate limit for syn packets to bungeecord, which somewhat mitigates the attack by not crashin the server. However, no new players can join while an attack is running. So this is not a permanent option :/

I also don't make profit from my server, so i can't afford professional layer 7 ddos mitigation. Hoping to get help here is my only option. Any help is appreciated.

narumii commented 3 years ago

You could argue that this is a ddos attack and can't be fixed by bungeecord/waterfall.

Yes thats true, bungeecoord/waterfall had many exploits but md5 claims that they don't even work xd, waterfall does something but still their "antidos" is trash.

However, the host machine was using about 30% cpu and 10% of it's network resources, it's really only bungeecord that is struggeling to keep up with that many requests.

Imo bungeecord has fucked networking system, even vanilla's one is better xd.

So probabbly it's a encryption response or other packet that can has very big data, waterfall and bungeecord doesn't have good limiter for it

antbig commented 3 years ago

Do you have the beginning of the attack ? (the logs before the java.lang.OutOfMemoryError: Direct buffer memory) ?

Janmm14 commented 3 years ago

@narumii This offensive language will not get you anywhere. You did not provide any useful information.

I suggest that the maintainers of waterfall delete your comment.

I suggest that you try out #609, there's also a link to a test Waterfall jar in there. That might help with your problem.

Sneakometer commented 3 years ago

Do you have the beginning of the attack ? (the logs before the java.lang.OutOfMemoryError: Direct buffer memory) ?

Here is the log from 5 minutes before the attack: https://pastebin.com/FrMPjp4G

In short, this: [22:30:00] [Netty Worker IO Thread #2/WARN]: [/CENSORED:57820] <-> InitialHandler - bad packet ID, are mods in use!? VarInt too big

Janmm14 commented 3 years ago

Do you have the beginning of the attack ? (the logs before the java.lang.OutOfMemoryError: Direct buffer memory) ?

Here is the log from 5 minutes before the attack: https://pastebin.com/FrMPjp4G

In short, this: [22:30:00] [Netty Worker IO Thread #2/WARN]: [/CENSORED:57820] <-> InitialHandler - bad packet ID, are mods in use!? VarInt too big

Is that amount of server list pings normal for your server?

narumii commented 3 years ago

@narumii This offensive language will not get you anywhere. You did not provide any useful information.

I suggest that the maintainers of waterfall delete your comment.

I suggest that you try out #609, there's also a link to a test Waterfall jar in there. That might help with your problem.

Yeah telling truth is offensive :( Big "DoS mitigations" that doens't works, also "DoS mitigations" from velocity doesn't work properly idk why /shrug

electronicboy commented 3 years ago

There is a difference between telling the truth and just being an ass about it

Each event pipeline thread iirc gets its own native buffer, this would imply that too many event threads fired up or something, don't think there is an actual leak here but I have no means to reproduce this to investigate. Many of these issues can be mitigated with basic configuration of s firewall to throttle connections in the event of an attack

On Sat, 10 Apr 2021, 18:28 なるみ, @.***> wrote:

@narumii https://github.com/narumii This offensive language will not get you anywhere. You did not provide any useful information.

I suggest that the maintainers of waterfall delete your comment.

I suggest that you try out #609 https://github.com/PaperMC/Waterfall/pull/609, there's also a link to a test Waterfall jar in there. That might help with your problem.

Yeah telling truth is offensive :( Big "DoS mitigations" that doens't works, also "DoS mitigations" from velocity doesn't work properly idk why /shrug

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PaperMC/Waterfall/issues/627#issuecomment-817175005, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJMAZEQBE2I6E7XOIWISTLTICDEBANCNFSM42WRWDVQ .

Sneakometer commented 3 years ago

Is that amount of server list pings normal for your server?

Yeah pretty much. 2-3 pings per second is what i would consider normal for the server.

Many of these issues can be mitigated with basic configuration of s firewall to throttle connections in the event of an attack

As already mentioned, i am rate limiting connections, filetering bad packets and limiting total connections per ip. Can you please tell me about the "basic firewall" so i can configure mine? Thanks

electronicboy commented 3 years ago

this specific case doesn't look like a basic firewall setup will help, I think I know what they're doing and it's shamefully an artifact of a service exposed to the internet doing its job, I think I have a way to limit the damage but will impact performance for some people relying on certain aspects of how netty works already

Sneakometer commented 3 years ago

I'm now running the test version from @Janmm14. I will let you know if it helped or not should the attackers do their thing again. I've noticed no issues so far.

electronicboy commented 3 years ago

628

Sneakometer commented 2 years ago

@electronicboy Due to the recent log4j exploits i had to switch to the latest official build. Since, our servers were attacked again with the same result. Your fix in #628 doesn't seem to be working or did break at some time in between. The attacker(s) where able to OOM Bungeecord within seconds using only about 40 requests.

20:40:58] [Netty Worker IO Thread #8/ERROR]: [/X.X.X.X:54102] <-> InitialHandler - encountered exception
java.lang.OutOfMemoryError: Cannot reserve 16777216 bytes of direct buffer memory (allocated: 1061287832, limit: 1073741824)
    at java.nio.Bits.reserveMemory(Bits.java:178) ~[?:?]
    at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:121) ~[?:?]
    at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:332) ~[?:?]
    at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:648) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:623) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:202) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:186) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.buffer.PoolArena.allocate(PoolArena.java:136) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.buffer.PoolArena.reallocate(PoolArena.java:286) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:99) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[waterfall.jar:git:Waterfall-Bootstrap:1.18-R0.1-SNAPSHOT:727780a:473]
    at java.lang.Thread.run(Thread.java:833) [?:?]

Running Waterfall build 473, Java 17, Debian

Janmm14 commented 2 years ago

@Sneakometer That fix in 628 was intentionally reverted here: https://github.com/PaperMC/Waterfall/commit/f17de7472d01af9cf94198e00386ed998558ea00

Xernium commented 2 years ago

That was intentional indeed, there are now some packets that can easily exceed that size (up to 16mb) which is already a whole ton too much Not sure how to fix this again

Janmm14 commented 2 years ago

That was intentional indeed, there are now some packets that can easily exceed that size (up to 16mb) which is already a whole ton too much Not sure how to fix this again

Is this problem happening because the client sends too large packets or because every packet, even if very small, is getting 16mib ram reserved?

Can a legit client send such large packets (or only the server) or could we have a different memory pool with different settings for the packets sent by the client?

antbig commented 2 years ago

The client is telling that the incoming packet size is very high to force netty to allocate the maximum amont of ram. (16mib) but they send the packet very very slowly. This way with a very little amount of client, you can create an oom. Only the server is sending very large packet (chunk).

Janmm14 commented 2 years ago

But we do not allocate a buffer with the size before the full packet is there in Varint21FrameDecoder.

So until then, the buffer we have is completely handled by netty and that should only grow as more data arrives?

electronicboy commented 2 years ago

This is seemingly to me akin to the slowloris attacks done against apache, reducing the native buffer size was NOT a fix, in any form shape or capacity was it really a fix, it just did as much of a mitigation here as possible towards junk being allocated in the trunk, which, pre 1.18 was a trivial way to at least mitigate this to some degree

Netty has a buffer pool which allows these native, direct buffers to be pooled rather than the expensive allocations of them across the board, you can increase your direct memory limit or use whatever system property it was to mitigate directly allocating these into direct memory, but, these buffers are slowly filled up and are shared across stuff; Here there's just enough connections using those buffers that it tries to allocate a new one and fails

The client isn't telling the thing to allocate a huge buffer, the buffer size is generally fixed (resizing these is expensive, so, you wanna avoid that)

Janmm14 commented 2 years ago

@electronicboy Since I moved the readtimeouthandler after the Varint21FrameDecoder in Bungee to counter slowloris-style attacks, this would mean that the attacker is able to do this in just 30 seconds?

electronicboy commented 2 years ago

the buffers are fixed size, so all you've gotta do is cause enough of them to be created

Janmm14 commented 2 years ago

the buffers are fixed size, so all you've gotta do is cause enough of them to be created

That really does not sound like the right thing to do from netty.

electronicboy commented 2 years ago

resizing the buffers is stupidly expensive, so it is the right thing to do, the big issue here is that you need to drain them at a decent pace, this is basically IMHO a massive architecture issue across the board

Janmm14 commented 2 years ago

@Sneakometer Did you made changes to the connection throttle configuration?

Janmm14 commented 2 years ago

the buffers are fixed size, so all you've gotta do is cause enough of them to be created

This really does not sound like it could be true. This would mean that with 512MiB ram there could only be 32 buffers allocated for 32 connections.

electronicboy commented 2 years ago

it's not supposed to be a buffer per connection, basically; This all gets nuanced on the technicalities of netty

Sneakometer commented 2 years ago

@Sneakometer Did you made changes to the connection throttle configuration?

Not sure what the default is, but it's set to connection_throttle: 8000 on my server.

Janmm14 commented 2 years ago

Netty changed default to what we had, so apparantly this change does not affect the maximum size of buffers.

electronicboy commented 2 years ago

I do not recall seing any resizing logic for the buffers, afaik the idea is that they're fixed size to prevent constant rescaling, most apps using netty are designed to deal with processing the queue effectively so that backpressure doesn't occur, etc

electronicboy commented 2 years ago

ah, so, we set the capacity of the buffers, the thing has logic to allocate less by default looking at it, but, maybe tries to always reserve the capacity? Thus, too many buffers = hitting the limit ofc, caveat here is that the entire system relies upon those buffers being drained effectively, this is basically an architectural issue