Closed kristianmandrup closed 3 years ago
I started an app-ml
project here.
I will try to add VSCode extension and Language Server support with well documented instructions. You can then use it as reference.
I wrote one here (but I am using my Chevrotain parser only indirectly there).
In my experience the official example was very helpful. It is not difficult to integrate your Chevrotain parser once you have learned how to set up a language server, so I am not sure if a Chevrotain-specific example is really necessary.
This is a topic that greatly interests me. For example my Toml-Tools project
As always the limiting factor is free time... I doubt I will personally get around to it but will be willing to accept contributions 😄
I wrote one here (but I am using my Chevrotain parser only indirectly there).
Thanks for that link @christianvoigt I may use it as a reference in my Toml-Tools project 👍
Great! Thanks. Well, it could just be a link of reference examples for "in the wild solutions" or perhaps an awesome-chevrotain page or repo... Cheers!
I'm pretty far now, with a well documented project, investigating how to do the "full shebang".
Still a WIP.
FWIW, Stardog Union (where I work) uses chevrotain to build parsers for RDF-related languages, and these parsers back our stardog-language-servers. So, if you follow the trail from our language servers repo to our parsers repo, you'll see some "in the wild" uses of chevrotain with the LSP (we also have VSCode extensions that use the language servers, if that's of interest). Our language servers are all relatively new, and they'll continue to improve as time goes on; right now, the one for SPARQL has the most features, so you might look at it and at the AbstractLanguageServer class that our language servers extend.
Sweet :) Thanks! I'm trying to build a "best practices" how to guide with a walk through and examples.
Also trying to add better support for CST with location info for source maps, and how to best deliver auto-suggest etc
ArgDown also seems to contain a Language Server which uses a Chevrotain Parser.
- https://github.com/christianvoigt/argdown/tree/master/packages
Oops it was already linked in this thread 😢
Sweet. I'll get back to it shortly. Thanks for helping to make this happen!!
I've since then converted the official mini SQL language lexer, parser, actions and error recovery tutorial to TypeScript here and now working to add it as a VSC language extension with syntax highlighting and LSP.
Would love any help. I'm also planning to create some kind of infrastructure for chevrotain to make this process much easier.
@jmrog I'm now finally looking into your stardog LSPs to see what I can use. I also know that some new "token tracking capabilities" have since been added to chevrotain which should make it easier to add such features and also more powerful/useful.
Looks like your stardog-language-utils is the place to start?
On a related note, I am working on some editor services logic for XML. Specifically content assist and validations.
This is the pure logic of implementing the editor services, not the language server protocol.
@bd82 Good stuff! I've been working on sql-lang the mini SQL chevrotain tutorial ported to TypeScript and now with utilities and documentation for writing a VS Code language extension complete with Language Server. Almost there...
@kristianmandrup is all the source code in that repo supposed to be there in the final version? or are those left overs from copying stardog code?
@bd82 Still needs to be cleaned up as bit as you point out. Will have it ready by next month I believe. By "almost there" I meant as least there is sufficient info and code there now for someone to implement their own VS Code extension and Language Server. The LSP implementation in turn would work for any compatible editor/IDE.
@bd82 I plan to do a full VSC extension and LSP for the sql mini lang tutorial.
I'm documenting it all at a detailed level as can be seen for LSP utils
O.k. great, I may be interested in creating a VSCode Extension with XML capabilities.
So this Language Server example may come in handy 👍
@bd82 It is looking pretty good now. Please check it out and let me know what you think. What could be improved etc. Pretty extensive and detailed docs and a lot of baseline functionality to work from. Follow links in utilities section Cheers.
There seems too many LOC(> 3000) and files (for me) to easily understand the relationships. Also there seems to be too many concerns, in-direction and abstractions, e.g:
What is the actual functionality presented?
Potential Missing Functionality:
@bd82 Thanks for having a look at it. My initial goal has mainly been experimentation. The packages
folder is not "real", just an indicator that the folders under it could be split out into one or more separate modules. It has yet to be structured in a more meaningful way. May main focus during the past couple of weeks has been to create a way to generate the VS Code syntax (json
) file directly from the parser meta data (see tests and markdown docs on VS Code extension.
I just put it all in the sql-lang
product for now, as I want to explore a full solution using the sql-lang before extracting the utils into one or more separate chevrotain helper modules.
I added error recovery only for the sql-lang parser example, taken directly from your tutorial. Have yet to generalise it.
Could the syntax directory be generated using reflection on Chevrotain's Grammar Structure?
Relative dependencies between different packages seem strange. Yes, I have not yet bothered to make it into a proper mono repo. Could also be done via the tsconfig
file, I know - but forgot how. Been used to using lerna
for this
Should this JSON Error Recovery Parser be removed? Well, is just a port of tutorial example for now
Should the Actions folder be removed? Well, is just a port of tutorial example for now
A separate .md file for each .ts file seems to double the number of files - this is for my own sanity so I can "keep it all in my head", as I'm not as well versed in all this infrastructure
Is this model/syntax generation generic? yes, it should be extracted into an npm module to be used as a "black box"
Why is an extensible BaseLangaugeServer
class needed as the scope of an example/tutorial? It is also to be extracted eventually. Only there during experimentation/development phase
Why is grammar inheritance used in the scope of an LSP example/tutorial? Please elaborate
Why is grammar inheritance used in the scope of an LSP example/tutorial? Please elaborate
I meant what is this turtle Parser?
Should this JSON Error Recovery Parser be removed? Well, is just a port of tutorial example for now
The tutorial Error Recovery code uses a JSON grammar, so I am not sure it is relevant for the LSP for SQL scenario.
Maybe I did not understand what you are building? Is it:
I thought it is the first one, but if it is the second one it could explain some of what I've been confused about.
Anyhow if you are implementing diagnostic language services (via LSP). You may not want to stop on the first error, so you may want to enable error recovery for your SQL grammar, you may even want to manipulate how the error recovery behaves, Examples from XML Editor Tooling project:
Exactly. The goal was to take the mini SQL language and make a full implementation including VS Code language extension with LSP (1 and 2 in effect). I wanted the code to facilitate it all "close at hand", in order to get a working implementation. Next step figuring out how to best extract the individual parts into reusable modules.
I kept the turtle parser around since it demonstrates using simple namespaces, something I'm interested in. I might want to extend the language to demonstrate that (or another simple language example). I'd love to see this sort of scoping functionality be easy to add to any chevrotain parser, especially nested scopes. Do you have any examples of that? I've seen that for ANTLR and I have the ANTLR books (and sample source code) as reference...
Thanks for the tips! I'll check it out tmrw.
Some things I could see included in a Chevrotain Grammar + LSP + Editor Services scenario.
The problem is that that is a lot of "functionality" for an example/tutorial. It sounds like it could fill out multiple blogs. And some specific functionality needs to be "dumbed down" for a tutorial scope and also not generic enoguh so it may not be production quality code...
I'd love to see this sort of scoping functionality be easy to add to any chevrotain parser, especially nested scopes. Do you have any examples of that?
Are you talking about fully qualified names, lexical scopes, identifier scope resolving, ... ?
If so that is part of building a compiler and should normally be done by processing the AST, meaning after Chevrotain has finished its role of syntactic analysis. Another reason not to add such logic during the parsing phase is that a name may not necessarily need to be defined before (line number) it is used. Or some names could only be resolved once some imports/linking is done.
For example in JavaScript a function may be defined after it is called (in line numbers).
Basically to implement full editor tooling (before LSP level), a Parser is not sufficient, you will need to implement parts of the compiler front-end and with more functionality than a basic machine code compiler.
The BaseLanguageServer
includes:
const lexDiagnostics = this.getLexDiagnostics(document, tokens);
const parseDiagnostics = this.getParseDiagnostics(document, errors);
return this.connection.sendDiagnostics({
uri,
diagnostics: [...lexDiagnostics, ...parseDiagnostics]
});
AbstractLanguageServer
)cli
and worker
in base-language-server
)class BaseLanguageServer {
// ...
get capabilities() {
return {
// Tell the client that the server works in NONE text document sync mode
textDocumentSync: this.documents.syncKind[0],
hoverProvider: true
};
}
// ...
}
Capabilities:
handleHover
in AbstractLanguageServer
to display the rule for the hovered item)onContentChange
in BaseLanguageServer
)language-configuration.json
)language-configuration.json
)util/syntax-gen
and utils/model-gen
As far as I understand, the language-configuration.json
provides basic folding, comment/uncomment and basic outline out of the box.
{
"comments": {
"lineComment": "//",
"blockComment": ["/*", "*/"]
},
"brackets": [["{", "}"], ["[", "]"], ["(", ")"]]
}
I read multiple blogs to get this far. A good one is on the dot
language by Fredericco Tomassi, where he uses chevrotain with VS Code, I believe. Been around 9 months since I last took on this effort, so I need to refresh my mind.
My goals at this point:
language-configuration.json
for now)dot
language blog posts by Tomassi)You're right, the scoping and namespacing belongs in the compiler. Haven't played with that since my uni days (using yacc).
I've removed the turle-lang
"module" as suggested.
Also added paths to tsconfig.json
so that relative imports no longer needed (more like proper modules)
Any chance you will add compiler tooling to the mix in the future for such common cases to make it easier to write a compiler?
Would be nice with a number of blog posts to go through all this when it's ready! Would love to collaborate on that, perhaps with Tomassi as well.
A good one is on the dot language by Fredericco Tomassi, where he uses chevrotain with VS Code,
Any chance for a link? sounds interesting.
Any chance you will add compiler tooling to the mix in the future for such common cases to make it easier to write a compiler?
That is unlikely, Chevrotain is quite complex and large with ~10,000 productive LOC and probably ~40,000 Total LOC. This is fairly large for a "spare time side project". I'd like to cleanup & simplify things rather than increase the scope of the project.
Would be nice with a number of blog posts to go through all this when it's ready! Would love to collaborate on that, perhaps with Tomassi as well.
Yes could be very interesting, I helped The Tomassetti brothers a little with this article in the past.
Sounds like there is a fair bit of functionality in your LSP example. And some potential for utility npm packages (syntax gen). If it could be simplified/cleaned-up/minimized enough we could also consider adding it as an "official" example.
This will provide higher visibility and integration into Chevrotain's CI/Build So it will always be up-to-date with latest Chevrotain APIs.
Here is the link: https://tomassetti.me/language-server-dot-visual-studio/ This blog post is also really good (why a DSL?): https://tomassetti.me/domain-specific-languages/
I'll check out your article. I'll try to wrap up my current effort with sql-lang to serve as a foundation for devs going that route. Then we can build on that progressively.
I see wrapping any parser/compiler with a minimal editor/IDE extension as essential for any language to be useful.
I'll try to add some compiler tooling libs next year when I have time.
I'd be happy to have it as an official example down the road as you suggest.
I've now extracted part of my project as:
Still some work to do...
Now the sql-lang
project (repo) can showcase using these utility functions/classes
How about we implement LSP with content assist for a minimal language:
=
and *
with scoped identifiers and scope blocks {
... }
a = 1
{
b = 2
}
{
c = 3
a * [content assist suggestion: a, c via scoped identifier lookup table
}
Thanks for the links.
I see wrapping any parser/compiler with a minimal editor/IDE extension as essential for any language to be useful.
True, but there is always the question of available resources 😢 I will hopefully be expending the XML Tools capabilities (next quarter?). But I may need to implement similar language services for a couple of additional languages to be able to start generalizing possible generic capabilities. So such capabilities seems a little unlikely at this time.
How about we implement LSP with content assist for a minimal language:
I think that using the same language (Mini SQL) for the whole Editor example makes more sense. Could the language be expended to include scoping? would colum aliases with nested subqueries introduce scoping rules? (I am very rusty with SQL...)
BTW Note that while Chevrotain supplies an API for syntactic content assist.
I actually did not use it while implementing the XML Editor Content Assist. Instead I implemented custom logic on the CST structure (using a CST Visitor). I found that to be:
The content assist would normally work on the compiler level, ie. the compiler creating a tree structure with parent child scoping, with a lookup table of local vars on each scope. Then checking all the way up the tree for the first match, serving all the available options at that position. So it would always have to be custom. This could be combined with syntactic content assist as you point out.
Didn't notice that feature before! I'll see what I can do end of this week or at least before New Year. Yeah, the compiler could use CTS Visitor to build up the tree structure with lookup tables. Ideally reusing existing sub-trees as Anders Hejlsberg pointed out.
I come from a Pascal, Turbo Pascal background myself back in the day and I'm from Denmark as well. We're pretty strong on prog. languages (C++ comes to mind).
Didn't notice that feature before! I'll see what I can do end of this week or at least before New Year. Yeah, the compiler could use CTS Visitor to build up the tree structure with lookup tables
I did not use the CST to build the lookup tables, some content assist may not be about names rather about values e.g ENUM Values (without prefix).
The CST Visitor I implemented for content assist instead did two things:
Thanks. I've now started a scope lang.
Visitor
Spec: All tests pass
a=1 { b=2 { c=3 } }
Not sure how to build a nested lookup map from this though. Another pass required I guess, ie. the first step of the compiler?
const ast = toAstVisitor(inputText);
const stm1 = ast[0];
const stm2 = ast[1];
expect(stm2).toEqual({
type: "SCOPE",
statements: [
{
type: "ASSIGNMENT",
variableName: "b",
valueAssigned: "2"
},
{
type: "SCOPE",
statements: [
{
type: "ASSIGNMENT",
variableName: "c",
valueAssigned: "3"
}
]
}
]
});
I managed to figure it out. Using scope stack in an additional traverse step.
Turning this
export const scopeTree = {
type: "SCOPE",
statements: [
{
type: "ASSIGNMENT",
variableName: "b",
valueAssigned: "2"
},
{
type: "SCOPE",
statements: [
{
type: "ASSIGNMENT",
variableName: "c",
valueAssigned: "3"
}
]
},
{
type: "SCOPE",
statements: [
{
type: "ASSIGNMENT",
variableName: "d",
valueAssigned: "4"
}
]
}
]
};
Into a tree with varsAvailable
list for each scope :)
[
{
"type": "ASSIGNMENT",
"variableName": "b",
"valueAssigned": "2",
"varsAvailable": [
"b"
]
},
{
"type": "SCOPE",
"statements": [
{
"type": "ASSIGNMENT",
"variableName": "c",
"valueAssigned": "3",
"varsAvailable": [
"b",
"c"
]
}
]
},
{
"type": "SCOPE",
"statements": [
{
"type": "ASSIGNMENT",
"variableName": "d",
"valueAssigned": "4",
"varsAvailable": [
"b",
"d"
]
}
]
}
]
From language-server-dot-visual-studio/
To add the completion provider (aka "content assist) for a VSC extension
connection.onInitialize((params): InitializeResult => {
return {
capabilities: {
// ...
completionProvider: {
resolveProvider: true,
"triggerCharacters": [ '=' ]
},
hoverProvider: true
}
}
});
Sample onCompletion
handler:
connection.onCompletion((textDocumentPosition: TextDocumentPositionParams): CompletionItem[] => {
let text = documents.get(textDocumentPosition.textDocument.uri).getText();
let position = textDocumentPosition.position;
// use parsed model to lookup via position
// return a list of auto complete suggestions (for = assignment)
return results;
In my case it should use the compiled model with scope information to present the varsAvailable
array as the return value. How would I best do this I wonder? I need to somehow retain the position of each assignment, then perhaps have an index that links to each assignment object.
{
"type": "SCOPE",
"statements": [
{
"type": "ASSIGNMENT",
"variableName": "d",
"valueAssigned": "4",
"position": 17,
"varsAvailable": [
"b",
"d"
]
}
]
}
const assignmentIndex = {
3: {varsAvailable: ['a'] },
9: {varsAvailable: ['a', 'b'] },
17: {varsAvailable: ['b', 'c'] },
}
Alternatively have a range for each, then find overlapping range?
Then for a given document char position, say 18
, it should look up the closes assignment index less than that (or in range), in this case 17
. Perhaps there is a better yay to do this with chevrotain?
I guess a primitive approach would be to simply iterate through this list until it finds first one with position greater than current pos (or at end of list) then use the one before that.
To add position info, I need to add this to the parser. Looks a little cumbersome IMO.
assignment(ctx: any) {
console.log("assignment = ", ctx);
const variableName = ctx.Identifier[0].image;
const {
startOffset,
endOffset,
startLine,
endLine,
startColumn,
endColumn
} = ctx;
const position = {
startOffset,
endOffset,
startLine,
endLine,
startColumn,
endColumn
};
// value assigned
const valueAssigned = this.visit(ctx.reference);
return {
type: "ASSIGNMENT",
variableName,
valueAssigned,
position
};
Would have been more elegant to have
position: {
offset: {
start: 13
end: 14
},
line: {
start: 1,
end: 1
},
column: {
start: 7,
end: 14
}
}
Perhaps retain both position state models (nested and list), even if duplicate, so they are easy to work with either way (grouped or individually)?
position: {
offset: {
start: 13
end: 14
},
line: {
start: 1,
end: 1
},
column: {
start: 7,
end: 14
}
}
This is prettier, but also creates 4 times as many objects which would be slower and consume more memory. (By how much I do not know though...)
I tried extracting positioning node decoration into a separate visitor baseclass
export class PositionVisitor extends BaseVisitor {
positioned = false;
constructor(opts) {
super();
this.positioned = opts.positioned;
}
get isPositioned() {
return Boolean(this.positioned);
}
decorate(node, ctx) {
return this.decoratePosition(node, ctx);
}
decoratePosition(node, ctx) {
if (this.isPositioned) {
const {
startOffset,
endOffset,
startLine,
endLine,
startColumn,
endColumn
} = ctx;
const position = {
startOffset,
endOffset,
startLine,
endLine,
startColumn,
endColumn
};
node.position = position;
}
return node;
}
}
So that I can simply decorate
in assignment
assignment(ctx: any) {
// console.log("assignment = ", ctx);
const variableName = ctx.Identifier[0].image;
// value assigned
const valueAssigned = this.visit(ctx.reference);
const node = {
type: "ASSIGNMENT",
variableName,
valueAssigned
};
return this.decorate(node, ctx);
}
However it complains:
Errors Detected in CST Visitor <AstVisitor>:
Redundant visitor method: <decorate> on AstVisitor CST Visitor
There is no Grammar Rule corresponding to this method's name.
For utility methods on visitor classes use methods names that do not match /^[a-zA-Z_]\w*$/.
Redundant visitor method: <decoratePosition> on AstVisitor CST Visitor
There is no Grammar Rule corresponding to this method's name.
For utility methods on visitor classes use methods names that do not match /^[a-zA-Z_]\w*$/.
How do I circumvent this? using _ or $ prefix perhaps to indicate these are not to be treated as visitor handler methods?
Ah yes, that do not match /^[a-zA-Z_]\w*$/.
so either would do :)
Then for a given document char position, say 18, it should look up the closes assignment index less than that (or in range), in this case 17. Perhaps there is a better yay to do this with chevrotain?
Are you trying to link the compiled Model (AST+Symbol Table) with the syntactic offset position in the text?
How do I circumvent this? using _ or $ prefix perhaps to indicate these are not to be treated as visitor handler methods?
Disable (delete) the Visitor validation... I think the CST Visitor needs some upgrades/refactoring...
Are you trying to link the compiled Model (AST+Symbol Table) with the syntactic offset position in the text?
Yes, exactly. I almost have it working already. Will likely finish it tmrw or at least by this week. See latest commit ;)
I kept the validation for now. Didn't realise the validate call in the visitor constructor did this method validation.
I think #1039 sounds like a good idea. Many other (better and more flexible) ways to go about it.
What I did was provide an AST in the content assist "scenario identification" CST Visitor. And manually traverse the AST while visiting the CST.
Now have a working content assist solution with nested scopes, displaying list of variables in scope as completion items for assignment :)
See chevrotain-nested-scope-lang-content-assist repo and Readme
Renamed my mini sql project chevrotain-mini-sql-lang
Regarding content assist, I found the following quick start
Now have a working content assist solution with nested scopes, displaying list of variables in scope as completion items for assignment :)
Great I found this repository much easier to understand as it uses has a small scope/grammar and fewer files.
I provided some feedback here:
There still seem to be code parts that are unrelated, e.g:
But I guess these are leftovers from the source it was copied from or place holders for future features... In more productive code this would be a bit of source code history pollution, but in demo/example code I guess it does not matter as consumers are normally only interested in the latest master branch.
Exactly! The unrelated parts were for reference (source code co-location) until I add the full set of features to the library. WIP ;)
I’ll have a look at your comments shortly. Thanks again :)
While this is not a guide, it may still be of interest: https://github.com/langium/langium
Not sure there would ever be time to make a full language server guide. I will try to preserve this by moving it to a discussion
Would be great if you could include some kind of example or reference guide for how to write a Language Server (standard IDE protocol) for a language written in Chevrotain.
Even for something as simple as the calculator or tinyc. It seems like quite a jungle to get started. Does anyone have experience in this regard or know of some good resources?