corazawaf / coraza-spoa

EXPERIMENTAL: A wrapper around the OWASP Coraza WAF for HAProxy's SPOE filters
Apache License 2.0
74 stars 16 forks source link

Excessive memory and CPU usage #97

Open mmilitzer opened 7 months ago

mmilitzer commented 7 months ago

Hi,

we would like to use coraza-spoa in production but unfortunately as soon as we put production traffic on it, after a few hours the coraza-spoa daemon "goes nuts", starts using up all available CPU resources and shows a rapidly increasing memory consumption up to consuming tens of GBs of RAM. The system then starts swapping and becomes entirely unresponsive until the OOM killer finally kills the coraza-spoa daemon. And this happens quite frequently for us, like several times a day.

We run multiple haproxy load-balancing servers and what I observed is that often when coraza-spoa on one "goes nuts" the issue propagates also to further load-balancers, so the coraza daemon also on other servers starts excessively using memory and cpu almost at the same time. My conclusion therefore would be that the problem is externally triggered, so caused by certain (malicious?) requests that come in. Only this would explain the time-correlation of the OOM event among multiple load-balancer servers.

However, I went through our access logs at the respective times and couldn't find any obviously unusual or suspicious requests. There also were not unusually many requests at those times, so it's not an issue with general load or request rate. But still, there must be something apparently, so certain problematic requests that trigger the issue and cause the excessive resource use. I also checked the coraza logs but there is absolutely nothing logged when it happens. So whatever it is, coraza apparently didn't attempt to block those requests (or wasn't even able to get that far at all in processing the request).

So I don't really have a handle on how to get down to the root of the problem. I don't even know whether the issue is in coraza itself or the problem is unique to the haproxy SPOE integration. So if you have any ideas what might be the problem or how we could debug it more effectively, please let me know.

Thanks! Michael

jcchavezs commented 7 months ago

Hi Michael,

Thanks for the detailed description. I think the first place to look at is if there is a memory leak somewhere. Maybe emitting go metrics at this point is the right thing to do. Then we can start looking deeper. Another option is to deploy coraza with go profiling enabled.

On Thu, 21 Dec 2023, 16:22 Michael Militzer, @.***> wrote:

Hi,

we would like to use coraza-spoa in production but unfortunately as soon as we put production traffic on it, after a few hours the coraza-spoa daemon "goes nuts", starts using up all available CPU resources and shows a rapidly increasing memory consumption up to consuming tens of GBs of RAM. The system then starts swapping and becomes entirely unresponsive until the OOM killer finally kills the coraza-spoa daemon. And this happens quite frequently for us, like several times a day.

We run multiple haproxy load-balancing servers and what I observed is that often when coraza-spoa on one "goes nuts" the issue propagates also to further load-balancers, so the coraza daemon also on other servers starts excessively using memory and cpu almost at the same time. My conclusion therefore would be that the problem is externally triggered, so caused by certain (malicious?) requests that come in. Only this would explain the time-correlation of the OOM event among multiple load-balancer servers.

However, I went through our access logs at the respective times and couldn't find any obviously unusual or suspicious requests. There also were not unusually many requests at those times, so it's not an issue with general load or request rate. But still, there must be something apparently, so certain problematic requests that trigger the issue and cause the excessive resource use. I also checked the coraza logs but there is absolutely nothing logged when it happens. So whatever it is, coraza apparently didn't attempt to block those requests (or wasn't even able to get that far at all in processing the request).

So I don't really have a handle on how to get down to the root of the problem. I don't even know whether the issue is in coraza itself or the problem is unique to the haproxy SPOE integration. So if you have any ideas what might be the problem or how we could debug it more effectively, please let me know.

Thanks! Michael

— Reply to this email directly, view it on GitHub https://github.com/corazawaf/coraza-spoa/issues/97, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXOYAUE3OSL4I7CE6DOMZLYKRH4ZAVCNFSM6AAAAABA6R6QO6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2TENRSGIYTANI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mmilitzer commented 7 months ago

Hi José,

thanks for your reply. I don't think that it is a "classic" memory leak because the memory consumption does not gradually increase over time. Rather, it is a very sudden event. So normally the coraraz-spoa daemon actually consumes very little memory but when "it goes nuts" it's suddenly allocating GBs of memory within seconds and all available system memory gets quickly used up until the process gets killed by the OOM killer.

Also, as mentioned, those events seem to happen (when they happen) with high time correlation across multiple of our load-balancers. So I'm suspecting it may be kind of a regexp denial of service, so that certain requests cause coraza to go into an endless loop. Or cause a deadlock so that processing of requests get stuck and just the queue with new incoming requests still gets filled up (but never emptied anymore) so that memory consumption just keeps increasing without limit (but I don't know the internal architecture of coraza and coraza-spoa - so would something like this be possible at all?).

I can try enabling go with profiling but if it's such an issue as I suspect then the profiling information may not help us much, right? But is there some way to debug on which request coraza got stuck (if that's what is happening)? Most of the time I can't witness the issue "live" because by the time I receive an alert about the excessive memory usage from our monitoring the coraza-spoa daemon then got already killed until I can connect to the machine. So this really happens quite fast...

NullIsNot0 commented 6 months ago

From your description, the symptoms looks exactly like bug in HAProxy v2.9.1. If you used HAProxy version 2.9.1, please downgrade to 2.9.0 because of this issue: https://github.com/haproxy/haproxy/issues/2395. It will be fixed in next 2.9.x version.

mmilitzer commented 6 months ago

Thanks for your suggestion. But we're still on haproxy v2.0.34 and not v2.9. Also, in our case it's not the haproxy process that uses 100% cpu but it's the coraza-spoa daemon process that "goes nuts" and consumes all available cpu and memory. Hence I don't think that the issue is in haproxy but it's rather either coraza itself or the coraza-spoa wrapper...

mmilitzer commented 5 months ago

I'm now seeing also the following stack traces when it crashes and it seems to be always the same crash reason. So it looks like it's actually crashing inside corazawaf itself then, right?

2024-02-04T01:46:26.969159+00:00 hostname.com coraza-spoa[26925]: panic: runtime error: slice bounds out of range [:-1]
2024-02-04T01:46:26.970724+00:00 hostname.com coraza-spoa[26925]: goroutine 347 [running]:
2024-02-04T01:46:26.970881+00:00 hostname.com coraza-spoa[26925]: github.com/corazawaf/coraza/v3/internal/corazawaf.(*Transaction).GetField(0xc0036894a0?, {0x0, 0x29, 0x0, {0x0, 0x0}, {0xc001c579e0, 0x2, 0x2}})
2024-02-04T01:46:26.971027+00:00 hostname.com coraza-spoa[26925]: /root/go/pkg/mod/github.com/corazawaf/coraza/v3@v3.0.1/internal/corazawaf/transaction.go:574 +0x41d
2024-02-04T01:46:26.971174+00:00 hostname.com coraza-spoa[26925]: github.com/corazawaf/coraza/v3/internal/corazawaf.(*Rule).doEvaluate(0xc00249c788, 0xc001c560c0?, 0xc001c54000, 0x0?, 0x0, 0x4e?)
2024-02-04T01:46:26.971324+00:00 hostname.com coraza-spoa[26925]: /root/go/pkg/mod/github.com/corazawaf/coraza/v3@v3.0.1/internal/corazawaf/rule.go:234 +0xd77
2024-02-04T01:46:26.971476+00:00 hostname.com coraza-spoa[26925]: github.com/corazawaf/coraza/v3/internal/corazawaf.(*Rule).Evaluate(...)
2024-02-04T01:46:26.971624+00:00 hostname.com coraza-spoa[26925]: /root/go/pkg/mod/github.com/corazawaf/coraza/v3@v3.0.1/internal/corazawaf/rule.go:171
2024-02-04T01:46:26.971789+00:00 hostname.com coraza-spoa[26925]: github.com/corazawaf/coraza/v3/internal/corazawaf.(*RuleGroup).Eval(0xc000154010, 0x2, 0xc001c54000)
2024-02-04T01:46:26.971938+00:00 hostname.com coraza-spoa[26925]: /root/go/pkg/mod/github.com/corazawaf/coraza/v3@v3.0.1/internal/corazawaf/rulegroup.go:219 +0x37e
2024-02-04T01:46:26.972086+00:00 hostname.com coraza-spoa[26925]: github.com/corazawaf/coraza/v3/internal/corazawaf.(*Transaction).ProcessRequestBody(0xc001c54000)
2024-02-04T01:46:26.972232+00:00 hostname.com coraza-spoa[26925]: /root/go/pkg/mod/github.com/corazawaf/coraza/v3@v3.0.1/internal/corazawaf/transaction.go:1003 +0x3b2
2024-02-04T01:46:26.972382+00:00 hostname.com coraza-spoa[26925]: github.com/corazawaf/coraza-spoa/internal.(*SPOA).processRequest(0xc0035f6c00?, 0x6dcde0?)
2024-02-04T01:46:26.972581+00:00 hostname.com coraza-spoa[26925]: /builds/user/coraza-spoa-build/coraza-spoa/internal/spoa.go:316 +0xe6e
2024-02-04T01:46:26.972754+00:00 hostname.com coraza-spoa[26925]: github.com/corazawaf/coraza-spoa/internal.(*SPOA).Start.func1(0xc0035f6c00)
2024-02-04T01:46:26.972907+00:00 hostname.com coraza-spoa[26925]: /builds/user/coraza-spoa-build/coraza-spoa/internal/spoa.go:49 +0xd8
2024-02-04T01:46:26.973056+00:00 hostname.com coraza-spoa[26925]: github.com/criteo/haproxy-spoe-go.(*conn).handleNotify(0xc00293efa0, {0x3, 0x1, 0x4e63081, 0x1, {0xc000d6800b, 0x1da, 0x3ff1}, {0xc000d68000, 0x3ffc, ...}}, ...)
2024-02-04T01:46:26.973206+00:00 hostname.com coraza-spoa[26925]: /root/go/pkg/mod/github.com/criteo/haproxy-spoe-go@v1.0.6/notify.go:109 +0xd9
2024-02-04T01:46:26.973352+00:00 hostname.com coraza-spoa[26925]: github.com/criteo/haproxy-spoe-go.(*conn).runWorker(0xc00293efa0, {0x3, 0x1, 0x4bed6e1, 0x1af, {0xc00294600c, 0x3bf, 0x3ff0}, {0xc002946000, 0x3ffc, ...}}, ...)
2024-02-04T01:46:26.973495+00:00 hostname.com coraza-spoa[26925]: /root/go/pkg/mod/github.com/criteo/haproxy-spoe-go@v1.0.6/conn.go:162 +0x1db
2024-02-04T01:46:26.973656+00:00 hostname.com coraza-spoa[26925]: created by github.com/criteo/haproxy-spoe-go.(*conn).run
2024-02-04T01:46:26.973852+00:00 hostname.com coraza-spoa[26925]: /root/go/pkg/mod/github.com/criteo/haproxy-spoe-go@v1.0.6/conn.go:136 +0xb50
2024-02-04T03:40:17.817002+00:00 hostname.com coraza-spoa[15038]: panic: runtime error: slice bounds out of range [:-1]
2024-02-04T03:40:17.818713+00:00 hostname.com coraza-spoa[15038]: goroutine 606 [running]:
2024-02-04T03:40:17.818921+00:00 hostname.com coraza-spoa[15038]: github.com/corazawaf/coraza/v3/internal/corazawaf.(*Transaction).GetField(0xc0033b1410?, {0x0, 0x29, 0x0, {0x0, 0x0}, {0xc0031038f0, 0x2, 0x2}})
2024-02-04T03:40:17.819076+00:00 hostname.com coraza-spoa[15038]: /root/go/pkg/mod/github.com/corazawaf/coraza/v3@v3.0.1/internal/corazawaf/transaction.go:574 +0x41d
2024-02-04T03:40:17.819218+00:00 hostname.com coraza-spoa[15038]: github.com/corazawaf/coraza/v3/internal/corazawaf.(*Rule).doEvaluate(0xc00251c788, 0xc001457d70?, 0xc003b9ac00, 0x0?, 0x0, 0x4e?)
2024-02-04T03:40:17.819361+00:00 hostname.com coraza-spoa[15038]: /root/go/pkg/mod/github.com/corazawaf/coraza/v3@v3.0.1/internal/corazawaf/rule.go:234 +0xd77
2024-02-04T03:40:17.819511+00:00 hostname.com coraza-spoa[15038]: github.com/corazawaf/coraza/v3/internal/corazawaf.(*Rule).Evaluate(...)
2024-02-04T03:40:17.819671+00:00 hostname.com coraza-spoa[15038]: /root/go/pkg/mod/github.com/corazawaf/coraza/v3@v3.0.1/internal/corazawaf/rule.go:171
2024-02-04T03:40:17.819817+00:00 hostname.com coraza-spoa[15038]: github.com/corazawaf/coraza/v3/internal/corazawaf.(*RuleGroup).Eval(0xc0001da010, 0x2, 0xc003b9ac00)
2024-02-04T03:40:17.819968+00:00 hostname.com coraza-spoa[15038]: /root/go/pkg/mod/github.com/corazawaf/coraza/v3@v3.0.1/internal/corazawaf/rulegroup.go:219 +0x37e
2024-02-04T03:40:17.820111+00:00 hostname.com coraza-spoa[15038]: github.com/corazawaf/coraza/v3/internal/corazawaf.(*Transaction).ProcessRequestBody(0xc003b9ac00)
2024-02-04T03:40:17.820255+00:00 hostname.com coraza-spoa[15038]: /root/go/pkg/mod/github.com/corazawaf/coraza/v3@v3.0.1/internal/corazawaf/transaction.go:1003 +0x3b2
2024-02-04T03:40:17.820405+00:00 hostname.com coraza-spoa[15038]: github.com/corazawaf/coraza-spoa/internal.(*SPOA).processRequest(0xc0038675c0?, 0x6dcde0?)
2024-02-04T03:40:17.820548+00:00 hostname.com coraza-spoa[15038]: /builds/user/coraza-spoa-build/coraza-spoa/internal/spoa.go:316 +0xe6e
2024-02-04T03:40:17.820712+00:00 hostname.com coraza-spoa[15038]: github.com/corazawaf/coraza-spoa/internal.(*SPOA).Start.func1(0xc0038675c0)
2024-02-04T03:40:17.820856+00:00 hostname.com coraza-spoa[15038]: /builds/user/coraza-spoa-build/coraza-spoa/internal/spoa.go:49 +0xd8
2024-02-04T03:40:17.821002+00:00 hostname.com coraza-spoa[15038]: github.com/criteo/haproxy-spoe-go.(*conn).handleNotify(0xc0036681e0, {0x3, 0x1, 0x4fc63ab, 0x1, {0xc00360c00b, 0x1da, 0x3ff1}, {0xc00360c000, 0x3ffc, ...}}, ...)
2024-02-04T03:40:17.821150+00:00 hostname.com coraza-spoa[15038]: /root/go/pkg/mod/github.com/criteo/haproxy-spoe-go@v1.0.6/notify.go:109 +0xd9
2024-02-04T03:40:17.821296+00:00 hostname.com coraza-spoa[15038]: github.com/criteo/haproxy-spoe-go.(*conn).runWorker(0xc0036681e0, {0x3, 0x1, 0x4f11a2b, 0x6, {0xc00414400b, 0x513, 0x3ff1}, {0xc004144000, 0x3ffc, ...}}, ...)
2024-02-04T03:40:17.821440+00:00 hostname.com coraza-spoa[15038]: /root/go/pkg/mod/github.com/criteo/haproxy-spoe-go@v1.0.6/conn.go:162 +0x1db
2024-02-04T03:40:17.821585+00:00 hostname.com coraza-spoa[15038]: created by github.com/criteo/haproxy-spoe-go.(*conn).run
2024-02-04T03:40:17.821746+00:00 hostname.com coraza-spoa[15038]: /root/go/pkg/mod/github.com/criteo/haproxy-spoe-go@v1.0.6/conn.go:136 +0xb50