PrismJS / prism

Lightweight, robust, elegant syntax highlighting.
https://prismjs.com
MIT License
12.23k stars 1.29k forks source link

Is it ok to reuse Prism object when loading multiple languages? #1393

Closed CoenraadS closed 6 years ago

CoenraadS commented 6 years ago

Hey

I noticed that if I do:

loadLanguages(["javascript"]);

and then later loadLanguages(["markup", "javascript"]);

Then the second one won't parse the javascript. However if ["markup", "javascript"] is loaded first it works correctly.

Golmote commented 6 years ago

Hi, can you provide a full runnable snippet to illustrate the issue please? I'm not sure I get it.

CoenraadS commented 6 years ago
var Prism = require("prismjs/components/prism-core.js");
var loadLanguages = require("prismjs/components/index.js");

loadLanguages("javascript"); // This line causes script parsing to break in the following html file
loadLanguages(["markup", "javascript"]);

var someFile = 
`
<!DOCTYPE html>
<html lang="en"> 
  <head>
    <script>
     export default 
     {
       v
     }
    </script>
</html>
`

var tokenized = Prism.tokenize(someFile, Prism.languages["markup"]);
console.log(tokenized);
Golmote commented 6 years ago

Oh, OK I see now. The JS components adds support for JS in Markup only if the Markup component is loaded before the JS component is. Hence, when you first load the JS component, the Markup component is not loaded and the support is not added. When you then load Markup + JS, it too late because of require()'s cache... and the JS component won't be executed again. We should find a way to invalidate the cache here.

mAAdhaTTah commented 6 years ago

Basic idea is this:

delete require.cache[require.resolve(pathToLanguage)]

Would make a Good First Issue™.

Golmote commented 6 years ago

Should we do this on every call? Or should we provide a second parameter to force the cache invalidation? First one sounds quite inefficient. But the second one requires some knowledge of the inner behaviour of the components...

CoenraadS commented 6 years ago

If its done that way, will previously loaded languages still work?

Golmote commented 6 years ago

Yes, they should. I don't think reloading a component can break any previously loaded component.

mAAdhaTTah commented 6 years ago

I don't think it's that inefficient. We're not loading that many files. Users also probably shouldn't be loading the same language multiple times anyway. Maybe document all of this and just invalidate every time?

Adding a second parameter seems like the type of thing people will just throw in when things aren't working.

Golmote commented 6 years ago

Alright, agreed.

Golmote commented 6 years ago

I don't think reloading a component can break any previously loaded component.

Actually, I am wrong about this.

Consider we do implement the cache clearing, and a user wants to highlight JS in HTML:

This will work, obviously:

loadLanguages(['markup', 'javascript']);

This will work too, thanks to the cache being cleared:

loadLanguages('javascript'); loadLanguages(['markup', 'javascript']);

But this will break, because of the cache being cleared...

loadLanguages(['markup', 'javascript']); loadLanguages('markup');

Slightly more subtle, this won't work either, because markdown requires markup and thus refreshes it:

loadLanguages(['markup', 'javascript']); loadLanguages('markdown');

I'm really reconsidering the idea of the second parameter...

mAAdhaTTah commented 6 years ago

Wow, nice catch.

If we wanted to still avoid said second parameter, we'd have to resolve them in the loadLanguages function, e.g. when we do this:

loadLanguages(['markup', 'javascript']); loadLanguages('markup');

We'd have to reload javascript after we've loaded markup the second time.

Is that "optional dependency" made explicit anywhere? We could add optional dependencies to components.js, and reload them conditionally based on whether they've been loaded already or not.

It's not ideal, from an implementation perspective, but I'd much rather pull that complexity into Prism rather than expecting users to understand whether or not they should be invalidating the cache. Understanding whether to do so requires users to understand Prism's internals, which I'd like to avoid even at the expense of making loadLanguages more complex.

Golmote commented 6 years ago

Ok, I dug into this.

Here is the list of the "optional dependencies" I've come up with:

Language Optional dependencies
CSS Markup
JavaScript Markup
ActionScript Markup
Django CSS, JavaScript
Haml CSS, CoffeeScript, ERB, JavaScript, Less, Markdown, Ruby, SCSS, Textile
HTTP JavaScript, Markup
OpenCL C, C++ (***)
Pug CoffeeScript, EJS, Handlebars, (Hogan), Less, LiveScript, Markdown, (Mustache), (Plates), SCSS, Stylus, (Swig)
Pure C, C++, Fortran, (ATS), (DSP)
Textile CSS

(*** This is a weird one: OpenCL extends C++ which extends C, but it also add keywords to both languages. Apparently this was done so that C/C++ code blocks could use the OpenCL host API and it would be properly highlighted. The thing that worries me the most here is the name optionalDependencies... since they are not optional at all in that specific case.)

As you can see, this is not such a long list. Yet, it may have huge impacts on the loading order.

One of the worst case scenario I've found is:

loadLanguages(['actionscript', 'http', 'textile', 'django', 'haml', 'css', 'javascript', 'markup', 'xeora']);

which will lead to the following files being actually required:

Click to expand the "graph". - `c-like` - `javascript` - `actionscript` - `http` - `markup` - -> reload dependents - `javascript` - `actionscript` - `http` - `textile` - `markup` - -> reload dependents - `javascript` - `actionscript` - `http` - `django` - `c-like` - `ruby` - `haml` - `css` - -> reload dependents - `django` - `haml` - `textile` - -> reload dependents - `haml` - `c-like` - `javascript` - -> reload dependents - `django` - `haml` - `http` - `markup` - -> reload dependents - `css` - -> reload dependents - `django` - `haml` - `textile` - -> reload dependents - `haml` - `javascript` - -> reload dependents - `django` - `haml` - `actionscript` - `http` - `markup` - -> reload dependents - `css` - -> reload dependents - `django` - `haml` - `textile` - -> reload dependents - `haml` - `javascript` - -> reload dependents - `django` - `haml` - `actionscript` - `http` - `xeora`

(That's 50 files require'd, for 9 languages)

Obviously, calling loadLanguages() without paremeters also leads to a huge amounts of dependencies. (413 files required, for our 148 languages)

Am I overthinking an edge case? Do you have alternative names in mind, for the OpenCL case?

mAAdhaTTah commented 6 years ago

I don't know if that's a big deal for node applications. A standard node_modules is pretty huge and has a ton of files, you're often installing a couple hundred dependencies and require'ing a boatload of files on startup.

Are people going to be re-require'ing language definitions often enough for this to be a problem? I'm inclined to think the surprise of it not working as expected and needing to dig into why would be worse than tracking down why so many files are loaded. That's also an easier problem to solve: load up all the required languages up front, then you don't get slowdown while your app is running.

Edit: That said, I'm not writing a lot of node applications, so I don't have a lot of experience here to lean on.

CoenraadS commented 6 years ago

Loading all languages is ~300ms for me (on demand loading was ~4ms). I can live with this since its a one time cost if everything else works correctly. If I'm the only one encountering this issue then you guys don't need to follow this up any further.

Previously I was calling loadLanguages('language') for every document the user opened, so it would be worse for my current situation if that got slower. One time slow or multiple times fast, but not multiple times slow 👍

Maybe when the node global issue is fixed, you can just keep a dictionary of languages and prism objects, and not worry about them conflicting since each object is independent.