clientIO / joint

A proven SVG-based JavaScript diagramming library powering exceptional UIs
https://jointjs.com
Mozilla Public License 2.0
4.48k stars 839 forks source link

fix(Vectorizer, Geometry): fix RegEx to avoid potential ReDoS attacks #1930

Closed zbynekstara closed 1 year ago

zbynekstara commented 1 year ago

Description

Fix RegEx in code to avoid potential ReDoS attacks. See the CodeQL reference for more information.

Motivation and Context

JavaScript's RegEx engine uses backtracking which means that its worst-case time complexity can be polynomial or even exponential. This can happen when the regex fails to find any matches in the queried string, when the regex is written in such a way that forces the engine to investigate each character in the queried string multiple times.

Polynomial Worst-Case Compexity

The polynomial worst-case complexity happens in cases such as /\w+\(/. Here, the intention is to find a string of letters terminated with a (. If the queried string does end with a (, all is good. However, if the queried string doesn't end with a (, the engine needs to do a lot of backtracking, which leads to a potential ReDoS vector.

The engine needs to do a pattern that looks like this: start with first letter, collect all following letters, reach end of string without finding (, backtrack, start again with second letter, collect all following letters again, reach end of string without finding ( again, backtrack, start again with third letter... If the string is millions of characters long, the engine can freeze and bring down the whole application.

There are several ways to prevent this problem:

  1. Add an anchor (e.g. /\b\w+\(/) before each unbounded repetition, if it makes sense. The generic advice is to add a negative lookbehind before the repetition (e.g. /(?<!\w)\w*\(/ in our example) which acts as a generic anchor, but this is not available in JavaScript.
  2. Reorganize patterns around unbounded repetitions, so that they are anchored at the beginning, not the end. Usually, there is a logically equivalent way to specify such a pattern. For example, you can match parentheses with a comma-separated list of groups of text (e.g. (a,b,c,d)) in two logically equivalent ways:
    • either: "within parentheses, find an unbounded number of groups of (group of text followed by a comma), followed by a final group of text" - e.g. /\((?:\w+,)*\w+\)/
    • or: "within parentheses, find a group of text, followed by an unbounded number of groups of (comma followed by group of text)" - e.g. /\(\w+(?:,\w+)*\)/
    • the second option is not vulnerable to ReDoS because the anchor (comma) is at the beginning of the unboundedly-repeated group instead of at the end
  3. Replace unbounded repetitions - either by restructuring the code and/or the regex to avoid the need for them, or by replacing the repetition with an alternation listing all possible options (e.g. /(?:scale|translate|rotate|skewx|skewy|matrix)\(/ in our example).
  4. Limit backtracking by specifying the end condition (e.g. /[^)]+\(/ is better than /.+?\(/ is better than /.+\().
  5. Ensure that repetitions inside the regex cannot match the constant parts of the regex that precede them (e.g. \btranslate\([^()]+\) where the added ( in the square bracket group prevents infinite recursion inside the square bracket group).

Note: Repetitions at the very end of a regex (e.g. /\w+/ or /\w+|abc/) are okay because they succeed/fail immediately, without triggering the backtracking which is the problem.

Another case with polynomial worst-case complexity is /^\w+,?\w+\(/ in absence of the , and (. In this situation, the second /w+ subregex becomes unanchored, forcing the regex engine to traverse all letters for each letter, as above. This problem can be prevented by ensuring that there are no overlaps between the two subregexes - that in every possible situation, each subregex is anchored (e.g./\w+(,\w+)?\(/).

Exponential Worst-Case Complexity

Exponential worst-case complexity happens when the polynomial worst-case complexity items happen inside a repetition and without an anchor. Building on from the previous example, if (\w+,?\w+)*, in absence of the ,. The problem is that not only does the regex engine check every possibility for each letter, it also checks every combination of possible arrangements of the two subregexes, which leads to the exponential factor.

The surest way to prevent these is to avoid repetitions inside repetitions as much as possible - and, if such a construct is necessary, to make sure that the possibilities (e.g. in an alternation) can never overlap.

lgtm-com[bot] commented 1 year ago

This pull request introduces 1 alert and fixes 1 when merging eb1dcef3857408335cae903359cdb8b3da1a61c4 into 8e024bc5e63e3d56ab36e1330b1457af369293c1 - view on LGTM.com

new alerts:

fixed alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed :rocket:. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] commented 1 year ago

This pull request fixes 1 alert when merging 17428e008cf642c2a2f99078218c2e6e8575cc11 into 8e024bc5e63e3d56ab36e1330b1457af369293c1 - view on LGTM.com

fixed alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed :rocket:. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] commented 1 year ago

This pull request fixes 1 alert when merging 589eee5b874ae71eedb572c2fcbb180f74b33411 into 8e024bc5e63e3d56ab36e1330b1457af369293c1 - view on LGTM.com

fixed alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed :rocket:. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] commented 1 year ago

This pull request fixes 1 alert when merging e3f4ddf5468cb757ca90d1946d2cdb6d6e677032 into 63e3e86cc6d6c80585cf1e7bb33a02c30efee075 - view on LGTM.com

fixed alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed :rocket:. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] commented 1 year ago

This pull request fixes 1 alert when merging db85f223ae25100ca01c8d497527402f86631707 into 63e3e86cc6d6c80585cf1e7bb33a02c30efee075 - view on LGTM.com

fixed alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed :rocket:. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] commented 1 year ago

This pull request fixes 4 alerts when merging 24c106d4be3701acfb93c99dabb7e9e2cb786805 into 63e3e86cc6d6c80585cf1e7bb33a02c30efee075 - view on LGTM.com

fixed alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed :rocket:. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] commented 1 year ago

This pull request fixes 4 alerts when merging 7c63ddb9dd44da1eeb6c3c21b05cdfe4980314e9 into 63e3e86cc6d6c80585cf1e7bb33a02c30efee075 - view on LGTM.com

fixed alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed :rocket:. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] commented 1 year ago

This pull request introduces 1 alert and fixes 4 when merging 8323bd9e5dac5562385921f26c833d91ec579d75 into 63e3e86cc6d6c80585cf1e7bb33a02c30efee075 - view on LGTM.com

new alerts:

fixed alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed :rocket:. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] commented 1 year ago

This pull request fixes 5 alerts when merging e0869334535a5942dc136472904a0996a9b82ff6 into 63e3e86cc6d6c80585cf1e7bb33a02c30efee075 - view on LGTM.com

fixed alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed :rocket:. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] commented 1 year ago

This pull request fixes 5 alerts when merging 6123d7d898e9e481f47e7c5a0d9e7b1d5d513a84 into 63e3e86cc6d6c80585cf1e7bb33a02c30efee075 - view on LGTM.com

fixed alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed :rocket:. For more information, please check out our post on the GitHub blog.