Open M4tteoP opened 1 year ago
Has there been any progress on this? The streaming body analysis sounds rather awesome and probably addresses a whole another array of problems that is caused by the buffering (e.g. timeouts) for bigger files.
We are changing how we are dealing with buffers. From now on we will only buffer up to Coraza limit (around 100kb) and then let the bodies go straight to upstream. In terms of buffering that is good enough as the buffer is going to be deterministic. Is that good enough for you or you re looking for analysing huge payloads as well?
In any case it would be cool to know if you are using this extension already and what sort of issues you are experiencing besides the timeouts.
On Fri, 20 Jan 2023, 11:52 thekief, @.***> wrote:
Has there been any progress on this? The streaming body analysis sounds rather awesome and probably addresses a whole another array of problems that is caused by the buffering (e.g. timeouts) for bigger files.
— Reply to this email directly, view it on GitHub https://github.com/corazawaf/coraza-proxy-wasm/issues/97#issuecomment-1398215263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXOYARW7UTM4GJPKJYXQL3WTJU5LANCNFSM6AAAAAASG2NPFI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Going forward with truly stream analysis I think we are far away from that. Many rules will return false positive/negative depending on where you split the body, e.g. sending "eva" in one chunk and "l(rm -rf)" in the next one where both have been analysed separately, the only way to get it right is to still evaluate the entire body. I think we just need to do best effort to avoid huge buffering and that would be good enough.
On Fri, 20 Jan 2023, 11:55 José Carlos Chávez, @.***> wrote:
We are changing how we are dealing with buffers. From now on we will only buffer up to Coraza limit (around 100kb) and then let the bodies go straight to upstream. In terms of buffering that is good enough as the buffer is going to be deterministic. Is that good enough for you or you re looking for analysing huge payloads as well?
In any case it would be cool to know if you are using this extension already and what sort of issues you are experiencing besides the timeouts.
On Fri, 20 Jan 2023, 11:52 thekief, @.***> wrote:
Has there been any progress on this? The streaming body analysis sounds rather awesome and probably addresses a whole another array of problems that is caused by the buffering (e.g. timeouts) for bigger files.
— Reply to this email directly, view it on GitHub https://github.com/corazawaf/coraza-proxy-wasm/issues/97#issuecomment-1398215263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXOYARW7UTM4GJPKJYXQL3WTJU5LANCNFSM6AAAAAASG2NPFI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I am currently working on a PoC, so I do not have some concrete data yet but I will update on it in the future.
Regarding the payload size we are currently internally discussing what the best approach would be. I have have some payloads up to 30 MB. Buffering them completely for analysis is they way I am used by ModSecurity but this also has some rather big drawbacks.
Preferably I would use a streaming solution as it would allow to prevent downstream services to run into timeout due to not receiving data until everything is received. Especially with slow upload speed this tends to be a problem.
Could you go into some details of this sentence:
From now on we will only buffer up to Coraza limit (around 100kb) and then let the bodies go straight to upstream.
Does this mean Coraza repeatedly buffers 100 KB, analyses the data, eventually sending it upstream until the processing has finished?
Sorry I meant 10mb. See https://github.com/corazawaf/coraza/blob/9d77c5a2ef50b45bf042ad249817b4c6d2272313/internal/corazawaf/waf.go#L274.
The directive SecRequestBodyLimit
"Configures the maximum request body size Coraza will accept for buffering.", hence beyond that limit we won't do any buffering and beyond that limit, you can either reject the request or process partially up to that limit (see SecRequestBodyLimitAction
). The rest of the payload should go straight to upstream without anymore buffering.
Alright, so basically the same handling as with ModSecurity.
Regarding a stream approach:
The naive approach would probably be to copy some few kB at the end of one data blob and attach it to the start of the next data blob. Would that be something worthwhile to evaluate? If so, I will check on my team and maybe we can look into adding an initial PR.
Shall we keep this issue open @M4tteoP? I think with current tuning having two buffers does not seem to be a big deal, also not sure if this is feasible from proxy-wasm side.
Currently, for both request and response bodies, we are relying on two different buffers:
return types.ActionPause
, we are buffering the body and stop sending it upstream. It is intended because before sending data upstream Coraza has to prove that no malicious patterns are found.tx.RequestBodyWriter().Write(body)
the body is added to Coraza buffer that will later process it.The biggest downside of this solution is Envoy's buffer default limit of 1MiB, ending with a
413: Payload Too Large
if exceeded. Example request:base64 /dev/urandom | head -c 1200000 > /tmp/file.txt && curl -i 'localhost:8080/anything' -d @/tmp/file.txt
Possible solutions:
RequestBodyReader
and append it to the Envoy buffer only if it is okay to send it upstream (AppendHttpRequestBody
). In order to implement it we have to:GetHttpRequestBody
each body frame and add it to the BodyWriter.types.ActionPause
but without buffering)The same considerations apply to the Response body.