Closed epatpol closed 6 years ago
We might be able to do it with primsjs instead of using textmate.
@epatpol Would be nice to see it in action. I'm wondering if it can be a viable alternative in terms of performance, as it will have to parse the whole document with every keystroke. Another question is how it can deal with syntactical errors.
Another option might be using tree-sitter. Atom currently migrates to it: https://github.com/tree-sitter/tree-sitter
I can also recommend watching this video where the author explains the advantages and the future of tree-sitter. It's also very good to learn about syntax highlighting in general: https://www.youtube.com/watch?v=a1rC79DHpmY
@mofux Thanks for the link, I'll definitely watch it! Otherwise it seems like that would have to be a backend service (seeing as it's written in C/C++ I believe), kinda like a language server (which might or might not be appropriate for that use case).
tree-sitter seems to be cool tech, but I don't see how it makes sense to write full grammars for languages when we already have a language server.
Some discussion was in this thread
But it seems like vscode-textmate
directly in the browser isn't going to happen anytime soon even though it seems like there might be some work going on right now to make it work. Oniguruma (major issue) and node fs dependencies (minor issue) would have to be resolved beforehand. We could always make it work like the RH POC and communicate with the back-end for this, but then we lose the speed of coloring directly in the frontend (much faster I believe).
Seeing as prism.js gave better coloring than the default from monaco quite quickly, I think we could use it on the frontend for now. It probably would also be a lot more simple I think. The other option would be to fork vscode-textmate and try to make it work in the frontend ourselves.
I agree prism seems better than the current state and could be something we use short term. But couldn't you just fork vscode-textmate, fix the dependencies and publish it under a different name?
@svenefftinge Hold on! https://github.com/Microsoft/vscode-textmate/issues/65#issuecomment-386884586
But couldn't you just fork vscode-textmate, fix the dependencies and publish it under a different name?
I actually had an almost finished forked version on my computer that did something similar (loading the grammar using string instead of filepath). But this seems it's completed. I'll try it tomorrow.
Here's the current state:
The way vscode uses textmate grammars (which are usually written in p-list? or xml) is to use a package named vscode-textmate which has a dependency to oniguruma, a C regex parser which doesn't work in the browser as of right now. vscode-textmate can also load grammars as plist or json, the latter which (I believe) is more readable and takes less space. Initially Redhat did a POC (basically copying almost all of vscode's code and just shim it to somehow connect to theia here (https://github.com/theia-ide/theia/issues/1757 See first comment with link to redhat's branch). There are several problems with this approach:
There's too much stuff I think is not needed for a prototype or a first iteration of syntax highlighting. Copying the code is not only (maybe?) not acceptable for an official patch, but it comes with a lot of extra features that takes time to understand (we probably don't need those in theia). it needs to send the whole content of the file back to the backend for parsing with vscode-textmate every time there's file changes. This can be slow (delay in highlighting) and can be unusable for big files. It creates another connection side by side with the LS for that specific language, which is probably overkill and doesn't bring much more to the table once LSP supports semantic coloring. The ideal would be to have a way to quickly highlight in the frontend, before LSP supports highlighting, but also before the delay with communicating with the (future) LS supporting semantic coloring has finished. This way we would have the front end service which would provide a way to quickly do a first pass on the code and provide a bit more coloring than the default tokenization from monaco (which is very basic), then receive additionnal more precise tokens from the backend after (which could take some time).
The initial solution was to use prismjs (http://prismjs.com/ ) which seemed to support a lot of languages out of the box. However even if the colorization was better than default monaco, it still wasn't on par with vscode (prototype here https://github.com/theia-ide/theia/commit/51ae2c9c9cbfaf0a9c884fcb0ce498ff6441fd19 ) And after discussions with colleagues, it seemed like it wasn't enough. The ideal would be "might as well support textmate grammars which is very widely adopted in that regard". Also the default prismjs css clashed a bit with theia's theming, so if we were to go in that direction, we would have to improve the css/token clahttps://github.com/theia-ide/theia/commit/51ae2c9c9cbfaf0a9c884fcb0ce498ff6441fd19sses for prismjs. The other solution was to try vscode-textmate like Red Hat but in the frontend.
The problem with that approach was that vscode-textmate doesn't work in the browser for reasons mentionned above. First of all the dependency on the C library (oniguruma) didn't have any alternatives. A lot of desktop text editors support textmate grammars and for a port to be done in the browser, one needs to find a way to make this regex parsing work in the frontend (even atom had a blocker for this if they ever planned to make a browser version as discussed here https://discuss.atom.io/t/running-atom-in-the-browser/8925 ) There was also a problem where vscode-textmate uses node.js fs api to load the grammars, so that would also need to be scrapped out if a browser fork was about to be made.
Luckily some guy from Github fixed most of the above issues by:
Implementing oniguruma C library in the browser using Web Assembly (wasm) Forking vscode-textmate to use that oniguruma wasm package (onigasm https://github.com/NeekSandhu/onigasm ) and to change the grammar loading process to use json loading (loading from a string instead of a filepath). The package can be found here https://github.com/NeekSandhu/monaco-textmate . In the process of testing it I also noticed some problems with it, but so far the maintainer has been very quick to fix those (a few hours after max) Providing a way to effortlessly hook monaco editors to the above package with a new package called monaco-editor-textmate which can be found here https://github.com/NeekSandhu/monaco-editor-textmate What I first did was find a way for extensions to use a certain contribution provider to provide a grammar for a certain language id and scope name (language ID is what is used in monaco to specify which language a model currently has i.e typescript, javascript, java etc. and a scope name refers to what textmate uses to map a grammar to a language i.e source.ts for typescript, source.js for javascript etc.)
The first problem I encountered was that to load the grammar file in the frontend and use it in a language contribution, I had to use a json-loader for webpack. Luckily webpack now provides this by default for json files, so no special loaders are required for this.
The second problem I encountered (which seems to be webpack specific) is that in order to use monaco-textmate (the fork of vscode-textmate) which uses onigasm (instead of oniguruma), you have to load the onigasm web assembly as a peer dependency before having access to any of the library functions (adding onigasm as a dependency doesn't make it available to the frontend yet, you have to specifically load it at the start of your application). Normally this would be no problem in the frontend. However because we use webpack4, which comes with a built-in asm loader. Every time we include a wasm file, webpack asm loader kicks in and ruins the content of the code somehow, because webpack doesn't support having access to low level functions like malloc and memory access stuff. Also when the asm loader from webpack kicks in, it tries to load the asm file directly, but what we really want webpack to do is simply provide a url for the import in the app init so that we manually load it, we don't want it to load the asm by itself because it will fail (because of things like no access to memory). Normally this isn't a problem for most wasm files, but for onigasm, in order to be very fast, it really needs access to those functions. So normally you want to load the asm file using webpack file loader (not asm loader) so that the content is unmodified, then you can simply load that into the browser context. Usually you have a way to disable (exclude) certain webpack loaders explicitly in the config, but for some reason disabling the asm loader doesn't work in webpack4. The issue is here (it's too complicated for my current web knowledge but they seem to agree there's an issue and are planning to fix it) https://github.com/webpack/webpack/issues/7264
One workaround was to use webpack3 so that the asm loader doesn't kick in, but sticking to webpack4 provides us with better stuff overall than webpack3, so it's not really an option at the moment.
I have a WIP branch in https://github.com/epatpol/theia-1/tree/epatpol/monaco-textmate where I try to use those packages and whatnot. It's not yet working but I think the skeleton is pretty good for a future prototype once all the problems with webpack settle down.
Update: It's possible to load onigasm using the file-loader so we can stick to webpack4 with this config:
{
test: /\.wasm$/,
loader: "file-loader",
type: "javascript/auto",
}
So far I've managed to get syntax coloring but somehow the state isn't correctly passed between the lines, I think it's a small problem in monaco-editor-package
which is a package made by Neek that wires up monaco with a token provider.
Also it seems that there are limitations with this package and that we should probably use another webpack plugin to use it correctly limitations
Some of the discussoin I had with @NeekSandhu https://github.com/NeekSandhu/monaco-textmate/issues/1
@marechal-p So if you have a look at my branch, I was currently testing the monaco webpack plugin that allows you to bundle monaco with less features (as we are providing token providers ourselves, the goal was to remove the default ones like typescript etc.). Otherwise if you try it right now, you get some coloring, but it's all broken. I think the issue is in here where somehow the state is equal to other
even though I think they should not. I suggest trying to debug that library and maybe make changes yourself if you feel like testing so. Otherwise you can follow up the discussion in the issue mentionned above, everything should be there :)
I think the issue is in here where somehow the state is equal to other even though I think they should not
Should be fixed now
82f0f38
- add ruleStack
comparison in TokenizerState.equals
method
Available in monaco-editor-textmate@1.0.3
https://github.com/theia-ide/theia/compare/epatpol/monaco-textmate
I currently fixed most node packages dependencies in order for Theia to build, but I now have a runtime error such as Error: Cannot find module "vs/language/typescript/tsWorker"
when I try to open a .ts
file.
Problem seems to come from monaco-editor/esm/vs/language/typescript/workerManager#L45
, where it seems to be looking for some module. I currently don't really understand how this could work with webpack or even something else.
I will keep looking, but if someone here knows how to fix this I am all ears !
Using vscode dark_plus
theme for test purpose:
@marcdumais-work if you are interested
@marechal-p starting to look nice :)
So now that this PoC is done and is working, the next step is to wrap this in a nice little extension.
I plan on moving the textmate grammar definition and the themes from the frontend's bundle.js
to the backend, so that we can just drop new grammars / themes in some folder and reload the frontend to load everything that is new, instead of having to rebuild the frontend with webpack everytime.
Maybe doing that on demand instead of loading everything on start, but we'll see, it could be left for future improvements.
marechal-p: Maybe doing that on demand instead of loading everything on start
Rink devtools does this on demand, i.e, fetch
grammars when appropriate file is opened.
rink-languages/src/index.ts#L38
Also, you guys might find this repository useful. It has grammars for most languages and has a script that generates a manifest for them.
(Sharing with a hope that we can improve stuff together, so please contribute back if possible)
@NeekSandhu I am currently implementing the infrastructure that should consume the grammars and the themes for Theia. The last working prototype was really messy, but it was a mean to understand what was required for it to work. Once it is up and working again, I will take a deep look at it ! ;)
This is done now.
We can probably follow Red Hat's POC https://www.youtube.com/watch?v=_pLDXlndgXA here