davidmarkclements / fast-redact

very fast object redaction
MIT License
278 stars 30 forks source link

Potential memory leak when using wildcard patterns #63

Closed crossleydominic closed 6 months ago

crossleydominic commented 12 months ago

We've been using fast-redactto redact PII data from an event stream we get from a partner. These events are high volume and quite large (on the order of a few kilobytes). This service has been experiencing memory issues that (I think) I've traced back to fast-redact. Here is a minimum reproducible example:

import fastRedact from 'fast-redact';

const redactPaths = [ 'a', '*.a' ]

const event: Record<string, unknown> = {
  id: 1,
  a: {
    id: 2,
  },
};

const eventString = JSON.stringify(event)

const fr = fastRedact({
  paths: redactPaths,
  serialize: false,
});

fr({
  // MUST use a new object each time!
  payload: JSON.parse(eventString),
})
fr({
  // MUST use a new object each time!
  payload: JSON.parse(eventString),
})
fr({
  // MUST use a new object each time!
  payload: JSON.parse(eventString),
})
fr({
  // MUST use a new object each time!
  payload: JSON.parse(eventString),
})

Every call to the fr function accumulates a copy of the input object inside the this.secret object in the compiled Function against a key of '', the empty string. You can observe this by applying the following diff to this library at tag v3.3.0:

diff --git a/lib/redactor.js b/lib/redactor.js
index af58885..8f7d163 100644
--- a/lib/redactor.js
+++ b/lib/redactor.js
@@ -10,6 +10,7 @@ function redactor ({ secret, serialize, wcLen, strict, isCensorFct, censorFctTak
     if (typeof o !== 'object' || o == null) {
       ${strictImpl(strict, serialize)}
     }
+    console.log(["In generated function", this.secret['']])
     const { censor, secret } = this
     ${redactTmpl(secret, isCensorFct, censorFctTakesPath)}
     this.compileRestore()

Crudely, the above just dumps the contents of the this.secret[''] value, but it's enough to demonstrate that this array is appended to for every call to fr. Running the above reproducible example with the diff applied to the codebase yields the output:

[ 'In generated function', undefined ]
[
  'In generated function',
  [ { path: [Array], value: [Object], target: [Object] } ]
]
[
  'In generated function',
  [
    { path: [Array], value: [Object], target: [Object] },
    { path: [Array], value: [Object], target: [Object] }
  ]
]
[
  'In generated function',
  [
    { path: [Array], value: [Object], target: [Object] },
    { path: [Array], value: [Object], target: [Object] },
    { path: [Array], value: [Object], target: [Object] }
  ]
]

It's not clear what the fix is here, is this the fault of the wildcard parser for having a beforeStr value of the empty-string? Is it the fault of the compiled function that uses the wildcards? Or is it an issue with the specialSet function that actually appends to the array?

mcollina commented 11 months ago

Wow, this is amazing. I don't have the time to develop a fix, but I would definitely be happy to review one.

roggervalf commented 6 months ago

hi @crossleydominic, I created a fix pr, hope it could be merged and released soon