PrismJS / prism

Lightweight, robust, elegant syntax highlighting.
https://prismjs.com
MIT License
12.22k stars 1.29k forks source link

Use ES2018 RegExp lookbehind assertions when available #1708

Open ExE-Boss opened 5 years ago

ExE-Boss commented 5 years ago

ECMAScript 2018 adds built‑in support for lookbehind assertions to regular expressions, which can be a lot faster than the current pseudo‑lookbehind implemented in Prism.

Also, it’d be nice to use the official syntax and only use the pre‑ES2018 implementation when necessary.

RunDevelopment commented 5 years ago

Sorry, but this feature is so new that not even FireFox (64.0.2) supports it at this point in time. It will probably take a few years until the majority of browsers support this feature. So Prism is likely to be stuck with the pseudo-lookbehinds for at least 5 years or so.

Also, I don't think that a polyfill is possible for this as the underlying Regex-engine itself has to be changed for this. Sure, you can implement one yourself in JS but then we don't have to talk about performance.

ExE-Boss commented 5 years ago

I don’t mean implement a polyfill, I mean keep using the current code path for browsers which don’t support lookbehind, but add a new code path for browsers which do support this, which would convert the first ( into (?<= for lookbehind: true during runtime if it’s supported.

RunDevelopment commented 5 years ago

I was actually also thinking that this might be a good idea at some point. "The browser (and Node) can optimize way better than we ever could and everything would be so much nicer", I was thinking.

But unfortunately, we can't do this because Prism's lookbehinds are not real lookbehinds; they behave differently.

To illustrate this, suppose the following pattern: /(a*)./ which is used to highlight this string: aa.

The real lookbehind obviously behaves correct but the point is that some patterns actually use this 'incorrect' behavior of Prism's lookbehinds, so we might get different highlighting results depending on the browser (version).

Another problem is that real lookbehinds can even access parts of the string before lastIndex. Prism cannot. This means that greedy matching will behave differently. So we would have to change that implementation as well.

I do think that we should totally go for real Regex lookbehinds as soon as they become available on a majority of browsers. We can then remove the fake Prism lookbehinds and live happily ever after. But until then the hassle of making Prism work with 2 slightly different lookbehinds is just too much to be worth it.


To be clear: One can use ECMAScript 2018 lookbehinds in patterns right now of course.

mAAdhaTTah commented 5 years ago

We have #1578 to track when we can drop IE11 support. We can't use new RegEx features until we've decided that we're going to abandon IE11.

RunDevelopment commented 5 years ago

@mAAdhaTTah We can't even use it IF we were to abandon IE11. This Regex feature is so new that not even the latest release of FF supports it. Only V8 implements this at the moment.

joshgoebel commented 3 years ago

AFAIK Safari is the major hold out now on support for look-behind. We don't plan to support it in Highlight.js until Safari has added proper support so that we have the same behavior across all modern browsers.

RunDevelopment commented 3 years ago

Interesting. Right now, about 75% use a browser that supports lookbehinds. If all Safari browsers added support for them now, we could be at >=90% in maybe a year. Well, that's a big if, so Prism will likely not use lookbehinds for another two years or so.

Also, Prism has to be conservative about this because we heavily use RegExp literals. Browsers that don't support lookbehinds in RegExp literals will throw a SyntaxError at parse time meaning that our whole library won't work instead of just one or two regexes.