django-ftl / fluent-compiler

High performance Python implementation of Fluent, Mozilla's l10n language
Other
21 stars 4 forks source link

Billion laughs attack protection #4

Open spookylukey opened 3 years ago

spookylukey commented 3 years ago

As noted in our security docs we currently have no protection against a billion laughs attack by FTL authors, either at compile-time or run-time.

Note the attack vector here is a malicious FTL author, which might be unlikely but should be considered for some usage scenarios. We are not talking about runtime issues where the attacker controls only the substitution, not the FTL message.

Compile-time

Example


-term1 = lol
-term2 = {-term1}{-term1}{-term1}{-term1}{-term1}{-term1}{-term1}{-term1}{-term1}{-term1}
-term3 = {-term2}{-term2}{-term2}{-term2}{-term2}{-term2}{-term2}{-term2}{-term2}{-term2}
# etc
message = {-term9}

Due to our current strategy of inlining all terms and simplifying, this will attempt to generate a function like:

def message(args, errors):
    return "lollollollollollollollollollollollollollollollollollol..."

and you'll use up a lot of memory at compile time.

We could protect against this by a combination of some kind of depth counter and reference counter in the compiler, and bailout when we hit the limits. In real world FTL, there is very rarely a need to have lots of references to other items, or deeply nested references.

Run-time

We don't inline messages at the call site, so the equivalent with messages would produce a run-time issue:

msg1 = lol
msg2 = {msg1}{msg1}{msg1}{msg1}{msg1}{msg1}{msg1}{msg1}{msg1}{msg1}
msg3 = {msg2}{msg2}{msg2}{msg2}{msg2}{msg2}{msg2}{msg2}{msg2}{msg2}
# etc.

Which compiles to something like:

def msg1(args, errors):
    return "lol"

def msg2(args, errors):
    return f'{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}'

# etc

Attempting to use the last function in the chain would produce a very large string at runtime.

We could address this in two ways:

  1. At run-time - the compiled code for each message could check call depth in some way (e.g. by a passed in current_depth parameter). This would be a performance hit on every message, and relatively speaking a very large one for the common case.

  2. At compile time, by:

We may need to make some of these limits configurable.

As per normal fluent rules, we should not bail out with exceptions in these cases, but produce message functions that: