fb55 / readabilitySAX

a fast and platform independent readability port (JS)
BSD 2-Clause "Simplified" License
245 stars 36 forks source link

ReDoS vulnerability in readabilitySAX.js #183

Open Moyuchu opened 1 year ago

Moyuchu commented 1 year ago

Description

ReDoS vulnerability is an algorithmic complexity vulnerability that usually appears in backtracking-kind regex engines, e.g. the python default regex engine. The attacker can construct malicious input to trigger the worst-case time complexity of the regex engine to make a denial-of-service attack.

In this project, here has used the ReDoS vulnerable regex (?:<br\/>(?:\s|&nbsp;?)*)+(?=<\/?p) that can be triggered by the below PoC:

const arg = require('arg');
const args = arg(
    {
        '--foo': String
    },        {
        argv: ['<br/>' + '<br/>'.repeat(24495) + '\x00']
    }
);

How to repair

The cause of this vulnerability is the use of the backtracking-kind regex engine. I recommend the author to use the RE2 regex engine developed by google, but it doesn't support lookaround and backreference extension features, so we need to change the original regex and add additional code constraints. Here is my repair solution:

const RE2 = require('re2');
// (?:<br\/>(?:\s|&nbsp;?)*)+(?=<\/?p)
function safe_match(node) {
    const r1 = new RE2('(?:<br\/>(?:\s|&nbsp;?)*)+', 'g');
    return node.replace(/(?=<\/?p)/g, '')
        .replace(r1, '')
}

The match semantics of the new regex + code constraint above is equivalent to the original regex.

I hope the author can adopt this repair solution and I would be very grateful. Thanks!