Chevrotain / chevrotain

Parser Building Toolkit for JavaScript
https://chevrotain.io
Apache License 2.0
2.49k stars 204 forks source link

Language Server guide #921

Closed kristianmandrup closed 3 years ago

kristianmandrup commented 5 years ago

Would be great if you could include some kind of example or reference guide for how to write a Language Server (standard IDE protocol) for a language written in Chevrotain.

Even for something as simple as the calculator or tinyc. It seems like quite a jungle to get started. Does anyone have experience in this regard or know of some good resources?

kristianmandrup commented 5 years ago

I started an app-ml project here. I will try to add VSCode extension and Language Server support with well documented instructions. You can then use it as reference.

christianvoigt commented 5 years ago

I wrote one here (but I am using my Chevrotain parser only indirectly there).

In my experience the official example was very helpful. It is not difficult to integrate your Chevrotain parser once you have learned how to set up a language server, so I am not sure if a Chevrotain-specific example is really necessary.

bd82 commented 5 years ago

This is a topic that greatly interests me. For example my Toml-Tools project

As always the limiting factor is free time... I doubt I will personally get around to it but will be willing to accept contributions 😄

bd82 commented 5 years ago

I wrote one here (but I am using my Chevrotain parser only indirectly there).

Thanks for that link @christianvoigt I may use it as a reference in my Toml-Tools project 👍

kristianmandrup commented 5 years ago

Great! Thanks. Well, it could just be a link of reference examples for "in the wild solutions" or perhaps an awesome-chevrotain page or repo... Cheers!

kristianmandrup commented 5 years ago

I'm pretty far now, with a well documented project, investigating how to do the "full shebang".

app-domain-ml

Still a WIP.

jmrog commented 5 years ago

FWIW, Stardog Union (where I work) uses chevrotain to build parsers for RDF-related languages, and these parsers back our stardog-language-servers. So, if you follow the trail from our language servers repo to our parsers repo, you'll see some "in the wild" uses of chevrotain with the LSP (we also have VSCode extensions that use the language servers, if that's of interest). Our language servers are all relatively new, and they'll continue to improve as time goes on; right now, the one for SPARQL has the most features, so you might look at it and at the AbstractLanguageServer class that our language servers extend.

kristianmandrup commented 5 years ago

Sweet :) Thanks! I'm trying to build a "best practices" how to guide with a walk through and examples.

Also trying to add better support for CST with location info for source maps, and how to best deliver auto-suggest etc

bd82 commented 5 years ago

ArgDown also seems to contain a Language Server which uses a Chevrotain Parser.

- https://github.com/christianvoigt/argdown/tree/master/packages

Oops it was already linked in this thread 😢

kristianmandrup commented 5 years ago

Sweet. I'll get back to it shortly. Thanks for helping to make this happen!!

kristianmandrup commented 4 years ago

I've since then converted the official mini SQL language lexer, parser, actions and error recovery tutorial to TypeScript here and now working to add it as a VSC language extension with syntax highlighting and LSP.

Would love any help. I'm also planning to create some kind of infrastructure for chevrotain to make this process much easier.

kristianmandrup commented 4 years ago

@jmrog I'm now finally looking into your stardog LSPs to see what I can use. I also know that some new "token tracking capabilities" have since been added to chevrotain which should make it easier to add such features and also more powerful/useful.

Looks like your stardog-language-utils is the place to start?

bd82 commented 4 years ago

On a related note, I am working on some editor services logic for XML. Specifically content assist and validations.

This is the pure logic of implementing the editor services, not the language server protocol.

kristianmandrup commented 4 years ago

@bd82 Good stuff! I've been working on sql-lang the mini SQL chevrotain tutorial ported to TypeScript and now with utilities and documentation for writing a VS Code language extension complete with Language Server. Almost there...

bd82 commented 4 years ago

@kristianmandrup is all the source code in that repo supposed to be there in the final version? or are those left overs from copying stardog code?

kristianmandrup commented 4 years ago

@bd82 Still needs to be cleaned up as bit as you point out. Will have it ready by next month I believe. By "almost there" I meant as least there is sufficient info and code there now for someone to implement their own VS Code extension and Language Server. The LSP implementation in turn would work for any compatible editor/IDE.

kristianmandrup commented 4 years ago

@bd82 I plan to do a full VSC extension and LSP for the sql mini lang tutorial.

I'm documenting it all at a detailed level as can be seen for LSP utils

bd82 commented 4 years ago

O.k. great, I may be interested in creating a VSCode Extension with XML capabilities.

So this Language Server example may come in handy 👍

kristianmandrup commented 4 years ago

@bd82 It is looking pretty good now. Please check it out and let me know what you think. What could be improved etc. Pretty extensive and detailed docs and a lot of baseline functionality to work from. Follow links in utilities section Cheers.

bd82 commented 4 years ago

Some Random thoughts:

General Impressions

There seems too many LOC(> 3000) and files (for me) to easily understand the relationships. Also there seems to be too many concerns, in-direction and abstractions, e.g:

What is the actual functionality presented?

Potential Missing Functionality:

kristianmandrup commented 4 years ago

@bd82 Thanks for having a look at it. My initial goal has mainly been experimentation. The packages folder is not "real", just an indicator that the folders under it could be split out into one or more separate modules. It has yet to be structured in a more meaningful way. May main focus during the past couple of weeks has been to create a way to generate the VS Code syntax (json) file directly from the parser meta data (see tests and markdown docs on VS Code extension.

I just put it all in the sql-lang product for now, as I want to explore a full solution using the sql-lang before extracting the utils into one or more separate chevrotain helper modules.

I added error recovery only for the sql-lang parser example, taken directly from your tutorial. Have yet to generalise it.

bd82 commented 4 years ago

Why is grammar inheritance used in the scope of an LSP example/tutorial? Please elaborate

I meant what is this turtle Parser?

Should this JSON Error Recovery Parser be removed? Well, is just a port of tutorial example for now

The tutorial Error Recovery code uses a JSON grammar, so I am not sure it is relevant for the LSP for SQL scenario.

Maybe I did not understand what you are building? Is it:

  1. a Mini SQL Grammar/Parser with LSP Server example.
  2. A Full Chevrotain Tutorial with LSP example added?.

I thought it is the first one, but if it is the second one it could explain some of what I've been confused about.

Anyhow if you are implementing diagnostic language services (via LSP). You may not want to stop on the first error, so you may want to enable error recovery for your SQL grammar, you may even want to manipulate how the error recovery behaves, Examples from XML Editor Tooling project:

kristianmandrup commented 4 years ago

Exactly. The goal was to take the mini SQL language and make a full implementation including VS Code language extension with LSP (1 and 2 in effect). I wanted the code to facilitate it all "close at hand", in order to get a working implementation. Next step figuring out how to best extract the individual parts into reusable modules.

I kept the turtle parser around since it demonstrates using simple namespaces, something I'm interested in. I might want to extend the language to demonstrate that (or another simple language example). I'd love to see this sort of scoping functionality be easy to add to any chevrotain parser, especially nested scopes. Do you have any examples of that? I've seen that for ANTLR and I have the ANTLR books (and sample source code) as reference...

Thanks for the tips! I'll check it out tmrw.

bd82 commented 4 years ago

Some things I could see included in a Chevrotain Grammar + LSP + Editor Services scenario.

The problem is that that is a lot of "functionality" for an example/tutorial. It sounds like it could fill out multiple blogs. And some specific functionality needs to be "dumbed down" for a tutorial scope and also not generic enoguh so it may not be production quality code...

bd82 commented 4 years ago

I'd love to see this sort of scoping functionality be easy to add to any chevrotain parser, especially nested scopes. Do you have any examples of that?

Are you talking about fully qualified names, lexical scopes, identifier scope resolving, ... ?

If so that is part of building a compiler and should normally be done by processing the AST, meaning after Chevrotain has finished its role of syntactic analysis. Another reason not to add such logic during the parsing phase is that a name may not necessarily need to be defined before (line number) it is used. Or some names could only be resolved once some imports/linking is done.

For example in JavaScript a function may be defined after it is called (in line numbers).

Basically to implement full editor tooling (before LSP level), a Parser is not sufficient, you will need to implement parts of the compiler front-end and with more functionality than a basic machine code compiler.

See: Anders Hejlsberg on Modern Compiler Construction

kristianmandrup commented 4 years ago

The BaseLanguageServer includes:

    const lexDiagnostics = this.getLexDiagnostics(document, tokens);
    const parseDiagnostics = this.getParseDiagnostics(document, errors);

    return this.connection.sendDiagnostics({
      uri,
      diagnostics: [...lexDiagnostics, ...parseDiagnostics]
    });
class BaseLanguageServer {
  // ...
  get capabilities() {
    return {
      // Tell the client that the server works in NONE text document sync mode
      textDocumentSync: this.documents.syncKind[0],
      hoverProvider: true
    };
  }
// ...
}

Capabilities:

As far as I understand, the language-configuration.json provides basic folding, comment/uncomment and basic outline out of the box.

{
  "comments": {
    "lineComment": "//",
    "blockComment": ["/*", "*/"]
  },
  "brackets": [["{", "}"], ["[", "]"], ["(", ")"]]
}

I read multiple blogs to get this far. A good one is on the dot language by Fredericco Tomassi, where he uses chevrotain with VS Code, I believe. Been around 9 months since I last took on this effort, so I need to refresh my mind.

My goals at this point:

You're right, the scoping and namespacing belongs in the compiler. Haven't played with that since my uni days (using yacc).

I've removed the turle-lang "module" as suggested. Also added paths to tsconfig.json so that relative imports no longer needed (more like proper modules)

Any chance you will add compiler tooling to the mix in the future for such common cases to make it easier to write a compiler?

Would be nice with a number of blog posts to go through all this when it's ready! Would love to collaborate on that, perhaps with Tomassi as well.

bd82 commented 4 years ago

A good one is on the dot language by Fredericco Tomassi, where he uses chevrotain with VS Code,

Any chance for a link? sounds interesting.

Any chance you will add compiler tooling to the mix in the future for such common cases to make it easier to write a compiler?

That is unlikely, Chevrotain is quite complex and large with ~10,000 productive LOC and probably ~40,000 Total LOC. This is fairly large for a "spare time side project". I'd like to cleanup & simplify things rather than increase the scope of the project.

Would be nice with a number of blog posts to go through all this when it's ready! Would love to collaborate on that, perhaps with Tomassi as well.

Yes could be very interesting, I helped The Tomassetti brothers a little with this article in the past.

Summary

Sounds like there is a fair bit of functionality in your LSP example. And some potential for utility npm packages (syntax gen). If it could be simplified/cleaned-up/minimized enough we could also consider adding it as an "official" example.

This will provide higher visibility and integration into Chevrotain's CI/Build So it will always be up-to-date with latest Chevrotain APIs.

kristianmandrup commented 4 years ago

Here is the link: https://tomassetti.me/language-server-dot-visual-studio/ This blog post is also really good (why a DSL?): https://tomassetti.me/domain-specific-languages/

I'll check out your article. I'll try to wrap up my current effort with sql-lang to serve as a foundation for devs going that route. Then we can build on that progressively.

I see wrapping any parser/compiler with a minimal editor/IDE extension as essential for any language to be useful.

I'll try to add some compiler tooling libs next year when I have time.

I'd be happy to have it as an official example down the road as you suggest.

kristianmandrup commented 4 years ago

I've now extracted part of my project as:

Still some work to do...

Now the sql-lang project (repo) can showcase using these utility functions/classes

kristianmandrup commented 4 years ago

How about we implement LSP with content assist for a minimal language:

= and * with scoped identifiers and scope blocks { ... }

a = 1
{
  b = 2
}
{
  c = 3
  a * [content assist suggestion: a, c via scoped identifier lookup table 
}
bd82 commented 4 years ago

Thanks for the links.

I see wrapping any parser/compiler with a minimal editor/IDE extension as essential for any language to be useful.

True, but there is always the question of available resources 😢 I will hopefully be expending the XML Tools capabilities (next quarter?). But I may need to implement similar language services for a couple of additional languages to be able to start generalizing possible generic capabilities. So such capabilities seems a little unlikely at this time.

How about we implement LSP with content assist for a minimal language:

I think that using the same language (Mini SQL) for the whole Editor example makes more sense. Could the language be expended to include scoping? would colum aliases with nested subqueries introduce scoping rules? (I am very rusty with SQL...)

BTW Note that while Chevrotain supplies an API for syntactic content assist.

I actually did not use it while implementing the XML Editor Content Assist. Instead I implemented custom logic on the CST structure (using a CST Visitor). I found that to be:

kristianmandrup commented 4 years ago

The content assist would normally work on the compiler level, ie. the compiler creating a tree structure with parent child scoping, with a lookup table of local vars on each scope. Then checking all the way up the tree for the first match, serving all the available options at that position. So it would always have to be custom. This could be combined with syntactic content assist as you point out.

Didn't notice that feature before! I'll see what I can do end of this week or at least before New Year. Yeah, the compiler could use CTS Visitor to build up the tree structure with lookup tables. Ideally reusing existing sub-trees as Anders Hejlsberg pointed out.

I come from a Pascal, Turbo Pascal background myself back in the day and I'm from Denmark as well. We're pretty strong on prog. languages (C++ comes to mind).

bd82 commented 4 years ago

Didn't notice that feature before! I'll see what I can do end of this week or at least before New Year. Yeah, the compiler could use CTS Visitor to build up the tree structure with lookup tables

I did not use the CST to build the lookup tables, some content assist may not be about names rather about values e.g ENUM Values (without prefix).

The CST Visitor I implemented for content assist instead did two things:

kristianmandrup commented 4 years ago

Thanks. I've now started a scope lang.

Visitor

https://github.com/kristianmandrup/sql-lang/blob/master/src/scope-lang/nested-scope-visitor/actions-visitor.ts

Spec: All tests pass

https://github.com/kristianmandrup/sql-lang/blob/master/src/scope-lang/nested-scope-visitor/actions.spec.ts

a=1 { b=2 { c=3 } }

Not sure how to build a nested lookup map from this though. Another pass required I guess, ie. the first step of the compiler?

    const ast = toAstVisitor(inputText);
    const stm1 = ast[0];
    const stm2 = ast[1];

    expect(stm2).toEqual({
      type: "SCOPE",
      statements: [
        {
          type: "ASSIGNMENT",
          variableName: "b",
          valueAssigned: "2"
        },
        {
          type: "SCOPE",
          statements: [
            {
              type: "ASSIGNMENT",
              variableName: "c",
              valueAssigned: "3"
            }
          ]
        }
      ]
    });
kristianmandrup commented 4 years ago

I managed to figure it out. Using scope stack in an additional traverse step.

https://github.com/kristianmandrup/sql-lang/blob/master/src/scope-lang/scope-stack/scope-stack-builder.spec.ts

Turning this

export const scopeTree = {
  type: "SCOPE",
  statements: [
    {
      type: "ASSIGNMENT",
      variableName: "b",
      valueAssigned: "2"
    },
    {
      type: "SCOPE",
      statements: [
        {
          type: "ASSIGNMENT",
          variableName: "c",
          valueAssigned: "3"
        }
      ]
    },
    {
      type: "SCOPE",
      statements: [
        {
          type: "ASSIGNMENT",
          variableName: "d",
          valueAssigned: "4"
        }
      ]
    }
  ]
};

Into a tree with varsAvailable list for each scope :)

   [
      {
        "type": "ASSIGNMENT",
        "variableName": "b",
        "valueAssigned": "2",
        "varsAvailable": [
          "b"
        ]
      },
      {
        "type": "SCOPE",
        "statements": [
          {
            "type": "ASSIGNMENT",
            "variableName": "c",
            "valueAssigned": "3",
            "varsAvailable": [
              "b",
              "c"
            ]
          }
        ]
      },
      {
        "type": "SCOPE",
        "statements": [
          {
            "type": "ASSIGNMENT",
            "variableName": "d",
            "valueAssigned": "4",
            "varsAvailable": [
              "b",
              "d"
            ]
          }
        ]
      }
    ]
kristianmandrup commented 4 years ago

From language-server-dot-visual-studio/

To add the completion provider (aka "content assist) for a VSC extension

connection.onInitialize((params): InitializeResult => {  
    return {        
        capabilities: {
           // ...
            completionProvider: {
                resolveProvider: true,
                "triggerCharacters": [ '=' ]
            },
            hoverProvider: true     
        }
    }
});

Sample onCompletion handler:

connection.onCompletion((textDocumentPosition: TextDocumentPositionParams): CompletionItem[] => {
    let text = documents.get(textDocumentPosition.textDocument.uri).getText();  
    let position = textDocumentPosition.position;       

   // use parsed model to lookup via position
    // return a list of auto complete suggestions (for = assignment)
    return results;  

In my case it should use the compiled model with scope information to present the varsAvailable array as the return value. How would I best do this I wonder? I need to somehow retain the position of each assignment, then perhaps have an index that links to each assignment object.

{
        "type": "SCOPE",
        "statements": [
          {
            "type": "ASSIGNMENT",
            "variableName": "d",
            "valueAssigned": "4",
            "position": 17,
            "varsAvailable": [
              "b",
              "d"
            ]
          }
        ]
      }
const assignmentIndex = {
  3: {varsAvailable: ['a'] },
  9: {varsAvailable: ['a', 'b'] },
  17: {varsAvailable: ['b', 'c'] },
}

Alternatively have a range for each, then find overlapping range?

Then for a given document char position, say 18, it should look up the closes assignment index less than that (or in range), in this case 17. Perhaps there is a better yay to do this with chevrotain?

I guess a primitive approach would be to simply iterate through this list until it finds first one with position greater than current pos (or at end of list) then use the one before that.

kristianmandrup commented 4 years ago

To add position info, I need to add this to the parser. Looks a little cumbersome IMO.

  assignment(ctx: any) {
    console.log("assignment = ", ctx);
    const variableName = ctx.Identifier[0].image;
    const {
      startOffset,
      endOffset,
      startLine,
      endLine,
      startColumn,
      endColumn
    } = ctx;
    const position = {
      startOffset,
      endOffset,
      startLine,
      endLine,
      startColumn,
      endColumn
    };
    // value assigned
    const valueAssigned = this.visit(ctx.reference);
    return {
      type: "ASSIGNMENT",
      variableName,
      valueAssigned,
      position
    };

Would have been more elegant to have

position: {
  offset: {
    start: 13
    end: 14
  },
  line: {
    start: 1,
    end: 1
  },
  column: {
    start: 7,
    end: 14
  }
}

Perhaps retain both position state models (nested and list), even if duplicate, so they are easy to work with either way (grouped or individually)?

bd82 commented 4 years ago
position: {
  offset: {
    start: 13
    end: 14
  },
  line: {
    start: 1,
    end: 1
  },
  column: {
    start: 7,
    end: 14
  }
}

This is prettier, but also creates 4 times as many objects which would be slower and consume more memory. (By how much I do not know though...)

kristianmandrup commented 4 years ago

I tried extracting positioning node decoration into a separate visitor baseclass

export class PositionVisitor extends BaseVisitor {
  positioned = false;

  constructor(opts) {
    super();
    this.positioned = opts.positioned;
  }

  get isPositioned() {
    return Boolean(this.positioned);
  }

  decorate(node, ctx) {
    return this.decoratePosition(node, ctx);
  }

  decoratePosition(node, ctx) {
    if (this.isPositioned) {
      const {
        startOffset,
        endOffset,
        startLine,
        endLine,
        startColumn,
        endColumn
      } = ctx;
      const position = {
        startOffset,
        endOffset,
        startLine,
        endLine,
        startColumn,
        endColumn
      };
      node.position = position;
    }
    return node;
  }
}

So that I can simply decorate in assignment

  assignment(ctx: any) {
    // console.log("assignment = ", ctx);
    const variableName = ctx.Identifier[0].image;

    // value assigned
    const valueAssigned = this.visit(ctx.reference);
    const node = {
      type: "ASSIGNMENT",
      variableName,
      valueAssigned
    };
    return this.decorate(node, ctx);
  }

However it complains:

Errors Detected in CST Visitor <AstVisitor>:
        Redundant visitor method: <decorate> on AstVisitor CST Visitor
        There is no Grammar Rule corresponding to this method's name.
        For utility methods on visitor classes use methods names that do not match /^[a-zA-Z_]\w*$/.

        Redundant visitor method: <decoratePosition> on AstVisitor CST Visitor
        There is no Grammar Rule corresponding to this method's name.
        For utility methods on visitor classes use methods names that do not match /^[a-zA-Z_]\w*$/.

How do I circumvent this? using _ or $ prefix perhaps to indicate these are not to be treated as visitor handler methods?

Ah yes, that do not match /^[a-zA-Z_]\w*$/. so either would do :)

bd82 commented 4 years ago

Then for a given document char position, say 18, it should look up the closes assignment index less than that (or in range), in this case 17. Perhaps there is a better yay to do this with chevrotain?

Are you trying to link the compiled Model (AST+Symbol Table) with the syntactic offset position in the text?

bd82 commented 4 years ago

How do I circumvent this? using _ or $ prefix perhaps to indicate these are not to be treated as visitor handler methods?

Disable (delete) the Visitor validation... I think the CST Visitor needs some upgrades/refactoring...

kristianmandrup commented 4 years ago

Are you trying to link the compiled Model (AST+Symbol Table) with the syntactic offset position in the text?

Yes, exactly. I almost have it working already. Will likely finish it tmrw or at least by this week. See latest commit ;)

I kept the validation for now. Didn't realise the validate call in the visitor constructor did this method validation.

I think #1039 sounds like a good idea. Many other (better and more flexible) ways to go about it.

bd82 commented 4 years ago

What I did was provide an AST in the content assist "scenario identification" CST Visitor. And manually traverse the AST while visiting the CST.

kristianmandrup commented 4 years ago

Now have a working content assist solution with nested scopes, displaying list of variables in scope as completion items for assignment :)

See chevrotain-nested-scope-lang-content-assist repo and Readme

Renamed my mini sql project chevrotain-mini-sql-lang

kristianmandrup commented 4 years ago

Regarding content assist, I found the following quick start

bd82 commented 4 years ago

Now have a working content assist solution with nested scopes, displaying list of variables in scope as completion items for assignment :)

Great I found this repository much easier to understand as it uses has a small scope/grammar and fewer files.

I provided some feedback here:

There still seem to be code parts that are unrelated, e.g:

But I guess these are leftovers from the source it was copied from or place holders for future features... In more productive code this would be a bit of source code history pollution, but in demo/example code I guess it does not matter as consumers are normally only interested in the latest master branch.

kristianmandrup commented 4 years ago

Exactly! The unrelated parts were for reference (source code co-location) until I add the full set of features to the library. WIP ;)

I’ll have a look at your comments shortly. Thanks again :)

bd82 commented 3 years ago

While this is not a guide, it may still be of interest: https://github.com/langium/langium

bd82 commented 3 years ago

Not sure there would ever be time to make a full language server guide. I will try to preserve this by moving it to a discussion