Closed CoenraadS closed 6 years ago
Hi, can you provide a full runnable snippet to illustrate the issue please? I'm not sure I get it.
var Prism = require("prismjs/components/prism-core.js");
var loadLanguages = require("prismjs/components/index.js");
loadLanguages("javascript"); // This line causes script parsing to break in the following html file
loadLanguages(["markup", "javascript"]);
var someFile =
`
<!DOCTYPE html>
<html lang="en">
<head>
<script>
export default
{
v
}
</script>
</html>
`
var tokenized = Prism.tokenize(someFile, Prism.languages["markup"]);
console.log(tokenized);
Oh, OK I see now. The JS components adds support for JS in Markup only if the Markup component is loaded before the JS component is. Hence, when you first load the JS component, the Markup component is not loaded and the support is not added. When you then load Markup + JS, it too late because of require()
's cache... and the JS component won't be executed again.
We should find a way to invalidate the cache here.
Basic idea is this:
delete require.cache[require.resolve(pathToLanguage)]
Would make a Good First Issue™.
Should we do this on every call? Or should we provide a second parameter to force the cache invalidation? First one sounds quite inefficient. But the second one requires some knowledge of the inner behaviour of the components...
If its done that way, will previously loaded languages still work?
Yes, they should. I don't think reloading a component can break any previously loaded component.
I don't think it's that inefficient. We're not loading that many files. Users also probably shouldn't be loading the same language multiple times anyway. Maybe document all of this and just invalidate every time?
Adding a second parameter seems like the type of thing people will just throw in when things aren't working.
Alright, agreed.
I don't think reloading a component can break any previously loaded component.
Actually, I am wrong about this.
Consider we do implement the cache clearing, and a user wants to highlight JS in HTML:
This will work, obviously:
loadLanguages(['markup', 'javascript']);
This will work too, thanks to the cache being cleared:
loadLanguages('javascript'); loadLanguages(['markup', 'javascript']);
But this will break, because of the cache being cleared...
loadLanguages(['markup', 'javascript']); loadLanguages('markup');
Slightly more subtle, this won't work either, because markdown
requires markup
and thus refreshes it:
loadLanguages(['markup', 'javascript']); loadLanguages('markdown');
I'm really reconsidering the idea of the second parameter...
Wow, nice catch.
If we wanted to still avoid said second parameter, we'd have to resolve them in the loadLanguages
function, e.g. when we do this:
loadLanguages(['markup', 'javascript']); loadLanguages('markup');
We'd have to reload javascript
after we've loaded markup
the second time.
Is that "optional dependency" made explicit anywhere? We could add optional dependencies to components.js
, and reload them conditionally based on whether they've been loaded already or not.
It's not ideal, from an implementation perspective, but I'd much rather pull that complexity into Prism rather than expecting users to understand whether or not they should be invalidating the cache. Understanding whether to do so requires users to understand Prism's internals, which I'd like to avoid even at the expense of making loadLanguages
more complex.
Ok, I dug into this.
Here is the list of the "optional dependencies" I've come up with:
Language | Optional dependencies |
---|---|
CSS | Markup |
JavaScript | Markup |
ActionScript | Markup |
Django | CSS, JavaScript |
Haml | CSS, CoffeeScript, ERB, JavaScript, Less, Markdown, Ruby, SCSS, Textile |
HTTP | JavaScript, Markup |
OpenCL | C, C++ (***) |
Pug | CoffeeScript, EJS, Handlebars, (Hogan), Less, LiveScript, Markdown, (Mustache), (Plates), SCSS, Stylus, (Swig) |
Pure | C, C++, Fortran, (ATS), (DSP) |
Textile | CSS |
(*** This is a weird one: OpenCL extends C++ which extends C, but it also add keywords to both languages. Apparently this was done so that C/C++ code blocks could use the OpenCL host API and it would be properly highlighted. The thing that worries me the most here is the name optionalDependencies
... since they are not optional at all in that specific case.)
As you can see, this is not such a long list. Yet, it may have huge impacts on the loading order.
One of the worst case scenario I've found is:
loadLanguages(['actionscript', 'http', 'textile', 'django', 'haml', 'css', 'javascript', 'markup', 'xeora']);
which will lead to the following files being actually required:
(That's 50 files require
'd, for 9 languages)
Obviously, calling loadLanguages()
without paremeters also leads to a huge amounts of dependencies. (413 files require
d, for our 148 languages)
Am I overthinking an edge case? Do you have alternative names in mind, for the OpenCL case?
I don't know if that's a big deal for node applications. A standard node_modules
is pretty huge and has a ton of files, you're often installing a couple hundred dependencies and require'ing a boatload of files on startup.
Are people going to be re-require'ing language definitions often enough for this to be a problem? I'm inclined to think the surprise of it not working as expected and needing to dig into why would be worse than tracking down why so many files are loaded. That's also an easier problem to solve: load up all the required languages up front, then you don't get slowdown while your app is running.
Edit: That said, I'm not writing a lot of node applications, so I don't have a lot of experience here to lean on.
Loading all languages is ~300ms
for me (on demand loading was ~4ms
). I can live with this since its a one time cost if everything else works correctly. If I'm the only one encountering this issue then you guys don't need to follow this up any further.
Previously I was calling loadLanguages('language') for every document the user opened, so it would be worse for my current situation if that got slower. One time slow or multiple times fast, but not multiple times slow 👍
Maybe when the node global issue is fixed, you can just keep a dictionary of languages and prism objects, and not worry about them conflicting since each object is independent.
Hey
I noticed that if I do:
loadLanguages(["javascript"]);
and then later
loadLanguages(["markup", "javascript"]);
Then the second one won't parse the javascript. However if
["markup", "javascript"]
is loaded first it works correctly.