eclipse-langium / langium

Next-gen language engineering / DSL framework
https://langium.org/
MIT License
742 stars 66 forks source link

Metadata on grammar rules #818

Open svallory opened 1 year ago

svallory commented 1 year ago

I would like to be able to define metadata on rules that I can then programmatically access using the generated Grammar.

For now, the use case I intend to use it for is automating the .tmLanguage.json generation. If I could write the code below...

/**
 * @scope: constant.character.escape.mylang
 */
terminal fragment ESCAPED_CHAR: '\\' ('n'|'t'|'r'|'\\');

/**
 * @scope: string.quoted.double.mylang
 */
terminal STRING: '"' ( ESCAPED_CHAR | !('\\'|'"') )* '"';

I could iterate the rules in the Grammar object, determine the tmLanguage json structure and even reuse the regex's.

In fact, I'm going to create a Map<RuleName, scope> and implement the tmLanguage generation algorithm. I won't have time to create a PR any time soon, but I'm happy to share my code here.

msujew commented 1 year ago

I believe this is related to https://github.com/langium/langium/discussions/699. We spoke about using TypeScript-style in another discussion somewhere.

I like the idea, it keeps the grammar slim and makes services that use these kinds of annotations very flexible 👍

Lotes commented 1 year ago

I still like the ideas (#699 + #818) of having annotations on my Langium grammar rules. That would be so sexy... so I was thinking about it and found some problems. Would be good to find an answer.

Problem 1: Hidden rules

One thing that bothers me here are binary operators as one symptom of a problem. Imagine you implemented a calculator:

/** a */
Expression: Additive;
/** b */
Additive extends Expression: Multiplicative ({BinaryOp.left = current} op='+' right=Multiplicative)*;
/** c */
Multiplicative extends Expression: Primary ({BinaryOp.left = current} op='*' right=Primary)*;
/** d */
Primary: number=NUMBER;

Where does the metadata will end up? It is not the type definitions, cannot be. The reason is that Additive and Multiplicative are made to BinaryOp, which has no rule, but an interface.

The other location could be the grammar JSON. Then each rule could get a metadata field

/**
 * @scope table
 * @scope statement
 * @param out
 */

would become

const metadata: Record<string, string[]> = {
  scope: ["table", "statement"],
  param: ["out"]
}

Problem 2: Handling fragments

And what about fragments? We have these parser rule fragments...

FunctionSignature: 'fun' NameAndGenerics '(' ... ')';
VariableSignature: 'var' NameAndGenerics;
//@scope xyz
fragment NameAndGenerics: name=ID ('<' ... '>')?

I think when resolving the metadata, the data of NameAndGenerics should be copied into the two rules where it is used.

Problem 3: UX & metadata computation

How handy would it be to locate a rule meta data by giving an AstNode? Let's say we have a calculator grammar and want to write a scope provider and want to know in which scope you can look up the cross-reference (e.g. class or function).

//@scope class
MemberCall: 'this' '.' member=[Member];

You will have a MemberCall object in your virtual hand. You get the grammar from the services, how to determine the metadata? You have the Node, not the Rule... Also, think of the binary operation, do you get a,b,c,d or all of them?

svallory commented 1 year ago

Hey @Lotes maybe I wasn't clear in the original post, but the metadata I proposed should be applied to the Langium grammar rule (and Langium grammar AstNode) themselves, and not to the language nodes the rules will parse. So I think those concerns don't apply here.

I don't see how metadata would be useful in this way since you could simply inject an attribute in the node with the value you want.

The benefit I see in adding annotations to grammar rules is that it would allow one to provide information for custom generators that operate on the grammar to create, for example, syntax highlighting grammars for Sublime, TextMate/VSCode, prism, highlights, etc

Lotes commented 1 year ago

Sorry, I have the tendency to overcomplicate things ^^*…

We have that internal function findCommentNode. We could add it as a new CommentService. I can have a closer look on Monday :)

Lotes commented 1 year ago

@svallory I made a PR draft here

The comment provider extracts the comment that is located before the AstNode. Inside the PR is also some test case, which should be your use case.

If you want to have parsed comments, we also have a documentation provider under services.documentation.DocumentationProvider.

svallory commented 1 year ago

Hey, @Lotes! I'm sorry for the late response, I was sure I had answered this. I was finishing a project this past month and with all the rush to launch, this slipped my mind.

I just checked your PR and saw the test case. This is precisely what I needed! Thank you so much! :)

Well, I'll still need to parse the comment to get structured data as I want to be able to add more information as I need to easily.

What I want to do with this is automatically generate a .tmLanguage TextMate grammar for syntax highlighting so I'm thinking at least I'll need a @scope and @ruleName. Does langium have any kind of JSDoc parser?

I'm not promising any delivery date, but would you guys be interested in adding that to Langium? If so, I would appreciate some guidance on how to plug it (btw, this could be the start of a generator plugin architecture). If not, I'll just build it as a separate CLI that takes in a .langium grammar file and spits out a .tmlanguage

msujew commented 1 year ago

@svallory We do have a JSDoc parser integrated in the framework, see here.

I'm not promising any delivery date, but would you guys be interested in adding that to Langium? If so, I would appreciate some guidance on how to plug it (btw, this could be the start of a generator plugin architecture).

Having a plugin architecture would be pretty interesting. We're currently moving to ESM see (https://github.com/langium/langium/pull/1125), so dynamic imports in the CLI shouldn't be an issue.

svallory commented 1 year ago

@msujew Awesome! The integrated JSDoc parser will come in pretty handy. And ESM will make plugin loading really flexible allowing for build runtime selection of which plugins to apply

I'll look at the easiest way to run my generator for now, since I want to focus on the feature. I was thinking of external CLI importing Langium to use the grammar and JSDoc parsers, but I noticed there's no public API or programmatic use documentation in the docs, so it would be hard to know what is exported and what isn't.

Is there a Langium API or internal architecture documentation I can read to understand Langium better?

spoenemann commented 1 year ago

Is there a Langium API or internal architecture documentation I can read to understand Langium better?

It depends on what aspects you'd like to understand. Basically the code is a collection of services that are plugged together using DI: https://langium.org/docs/configuration-services/

The processing of text documents is explained here: https://langium.org/docs/document-lifecycle/

To understand the LSP integration, the best thing is to look through this code: https://github.com/eclipse-langium/langium/blob/main/packages/langium/src/lsp/language-server.ts

Otherwise, feel free to ask specific questions.