PrismJS / prism

Lightweight, robust, elegant syntax highlighting.
https://prismjs.com
MIT License
12.23k stars 1.29k forks source link

Support incremental tokenization #3638

Open fabiospampinato opened 1 year ago

fabiospampinato commented 1 year ago

Motivation

Without incremental tokenization the syntax highlighter becomes essentially too inefficient to be used in a code editor, for example.

Description

Incremental tokenization allows for pausing and resuming tokenization, allowing for tokenizing a string line by line, and ideally also allowing to restart tokenization from a point different from the very start (after the previous lines have been tokenized already).

Without this feature syntax highlighting would probably be too expensive to perform in one go and/or visibly blocking for the user.

Alternatives

Using vscode-textmate, but that's kinda huge (and slow) since it needs Oniguruma to work.

panoply commented 1 year ago

The manner in which Prism is currently handling tokenization is sufficient enough for the sole purpose of highlighting. Extending support into code editors would be nice to have available and the existing algorithm can likely be refactored for incremental updates.

It would be nice to expose an additional method for this exact purpose (eg: Prism.update()). The main problem with the current logic (AFAIK) is that Prism treats every highlight operation without context of the previous one. Providing Prism with an existing structure might be easier to reason about with, it could look something like this:

const input = Prism.highlight('const foo ', Prism.language.js, 'javascript'); 

/**
 * Fugazi example:
 *
 * Consider this, with reference to the `input` an update can be carried out.
 * Passing Prism the existing tree, a comparison moph can applied.
 */
Prism.update(input, 'const foo =')

Providing the existing tree would allow for Prism to check whether or not tokenization needs to be applied. This could very well speed up the process when dealing with a large loc content string.