PowerShell / EditorSyntax

PowerShell syntax highlighting for editors (VS Code, Atom, SublimeText, TextMate, etc.) and GitHub!
MIT License
133 stars 45 forks source link

Implement a reliable way to test the syntax definition #2

Closed daviwil closed 6 years ago

daviwil commented 8 years ago

We need to find a way to test the syntax definition to ensure that any new changes don't break the grammar. It would be ideal if this could be done with the fewest number of dependencies possible so that we could run our tests in AppVeyor when PRs are sent.

I believe @vors might have some initial idea for how we could do that.

vors commented 8 years ago

A while ago @Jaykul proposed use https://github-lightshow.herokuapp.com/ for testing. I think it's the right way to go about it. It's fairly editor agnostic.

daviwil commented 8 years ago

I just found this GitHub repo: https://github.com/microsoft/vscode-textmate

Looks like we can use VS Code's TextMate parser as an API, might be helpful for testing the results of using our TextMate grammar against test files!

daviwil commented 8 years ago

The VS Code team uses some custom code to test the syntax definitions they ship (including the one for PowerShell). They use a well-known example code file for each language and then use their syntax tokenizer to output a JSON file which is then compared against a JSON file from a "known good" tokenization pass. Simple approach but it seems to work for them. We might be able to use this as a starting point for our own CI tests.

Here are some links to the relevant files:

daviwil commented 8 years ago

Looks like the shared syntax definition repo for TypeScript also has a similar solution and it's also using the vscode-textmate library:

https://github.com/Microsoft/TypeScript-TmLanguage/tree/master/tests

vors commented 8 years ago

As a heads-up, I'm planning to spend some time this and next week working on this tests automation. Please, feel free to use https://gitter.im/PowerShell/EditorSyntax if you have some ideas to share about it.

vors commented 8 years ago

TypeScript is a good example!

Plan

How to evaluate tests

I don't particularly like an idea of storing serialized regions. I would probably prefer run current grammar vs previous release (tag or just commit sha1) grammar and if it gives the same results on the tests included in the previous release, then CI passes. That way

  1. We don't need to store serialized intermediate tests in the repo
  2. We are confident that there are no regressions on the previous release test subset.
  3. There could be regressions (or fixes) on the new (compare to the baseline) tests, but maybe just leave them as a warnings.

How to store tests

I like the way how https://github.com/jgm/CommonMark/blob/master/spec.txt works. It's the single source for http://spec.commonmark.org/0.25/ and for tests https://github.com/jgm/CommonMark/blob/master/test/spec_tests.py All in one single file. Very convinient imo!

In SublimeText/PowerShell we have a single file, but it's a .ps1 file. https://github.com/SublimeText/PowerShell/blob/dev/tests/samples/test-file.ps1

It forces everybody to put comments as powershell comments, which interfere with the highlighting. For example, if something is broken in the middle of document and the rest of the doc is treated as a string, then test just show this fact and loose all other information. It's kind of old compilers that reports only one error at a time. Not convenient.

If we take a similar to CommonMark approach, we can use a spec document with some loose structure, that allows to identify test-case regions. For simplicity and popularity I would say a markdown document.

Test document

It could look like that

## Basics

declarations should be consistent for functions

```powershell
function foo.bar() {}
Function foo() {}

And classes

class A {}
Class Foo-Bar {}

And workflows

workflow w1 {}
Workflow work {}

And configurations

configuration c {}
Configuration c {}

Highlight types

Some explanation about test.
[int[]][char[]]"hello world"
[string]$someVariable = [char[]](104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)
[Collections.Generic.List``2[char]]$x = [char[]](104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)
[Collections.Generic.List[char]]$x = [char[]](104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)

Every 
   foobar


block would be treated as a separate test case.
This way, it's easy to read test cases subsequently (because they are listed subsequently) and get the context and intentions of them from the comments above and below.

Feedback is welcome! Please let me know if you see any problems with this approach
daviwil commented 8 years ago

So for the CommonMark approach, do they just have a "last known good" HTML file that they're comparing the Markdown formatting against? If so, that sounds fine to me. I like having a document serve a dual purpose to be both an example and a test artifact.

vors commented 8 years ago

No, they explicitly say what rendered view should look like, i.e. https://github.com/jgm/CommonMark/blob/master/spec.txt#L531

```````````````````````````````` example
***
---
___
.
<hr />
<hr />
<hr />
daviwil commented 8 years ago

Ahhh! Clever, I missed that.

vors commented 8 years ago

I started a node.js - based prototype for test harness that uses https://github.com/Microsoft/vscode-textmate

Opened a couple of issues: https://github.com/Microsoft/vscode-textmate/issues/20 https://github.com/Microsoft/vscode-textmate/issues/21

vors commented 8 years ago

I'm comparing old grammar from commit 0cabc46e3a40ce8d300403107b08a70708321ca6 to the current master.

Example for parsing is

function foo() {}
function bar() {}
class XXX {}

Here is a current output

Old

    Token "function" from 0 to 8 with scopes source.powershell,meta.function,storage.type
    Token " " from 8 to 9 with scopes source.powershell,meta.function
    Token "foo" from 9 to 12 with scopes source.powershell,meta.function,entity.name.function.powershell
    Token "(" from 12 to 13 with scopes source.powershell
    Token ") {}
" from 13 to 18 with scopes source.powershell
    Token "function" from 18 to 26 with scopes source.powershell,meta.function,storage.type
    Token " " from 26 to 27 with scopes source.powershell,meta.function
    Token "bar" from 27 to 30 with scopes source.powershell,meta.function,entity.name.function.powershell
    Token "(" from 30 to 31 with scopes source.powershell
    Token ") {}
" from 31 to 36 with scopes source.powershell
    Token "class" from 36 to 41 with scopes source.powershell,storage.type.powershell
    Token " " from 41 to 42 with scopes source.powershell
    Token "XXX" from 42 to 45 with scopes source.powershell,entity.name.function
    Token " {}
" from 45 to 50 with scopes source.powershell

Current

    Token "function" from 0 to 8 with scopes source.powershell,meta.function,storage.type
    Token " " from 8 to 9 with scopes source.powershell,meta.function
    Token "foo" from 9 to 12 with scopes source.powershell,meta.function,entity.name.function.powershell
    Token "(" from 12 to 13 with scopes source.powershell
    Token ") " from 13 to 15 with scopes source.powershell
    Token "{" from 15 to 16 with scopes source.powershell,meta.scriptblock.powershell
    Token "}" from 16 to 17 with scopes source.powershell,meta.scriptblock.powershell
    Token "
" from 17 to 18 with scopes source.powershell
    Token "function" from 18 to 26 with scopes source.powershell,meta.function,storage.type
    Token " " from 26 to 27 with scopes source.powershell,meta.function
    Token "bar" from 27 to 30 with scopes source.powershell,meta.function,entity.name.function.powershell
    Token "(" from 30 to 31 with scopes source.powershell
    Token ") " from 31 to 33 with scopes source.powershell
    Token "{" from 33 to 34 with scopes source.powershell,meta.scriptblock.powershell
    Token "}" from 34 to 35 with scopes source.powershell,meta.scriptblock.powershell
    Token "
" from 35 to 36 with scopes source.powershell
    Token "class" from 36 to 41 with scopes source.powershell,meta.class.powershell,storage.type.powershell
    Token " " from 41 to 42 with scopes source.powershell,meta.class.powershell
    Token "XXX" from 42 to 45 with scopes source.powershell,meta.class.powershell,entity.name.function.powershell
    Token " {" from 45 to 47 with scopes source.powershell,meta.class.powershell
    Token "}" from 47 to 48 with scopes source.powershell,meta.class.powershell
    Token "
" from 48 to 50 with scopes source.powershell

As you can see they are quite different already, even on this small example.

vors commented 8 years ago

Here is the code that used to produce it (totally node.js newbie)

var exec = require('child_process').exec;
var Parser = require('commonmark').Parser;
var Registry = require('vscode-textmate').Registry;

const gitCommitId = "0cabc46e3a40ce8d300403107b08a70708321ca6";
const grammarPath = "../PowerShellSyntax.tmLanguage";

function tokenize(codeSnippet, grammar)
{
    var lineTokens = grammar.tokenizeLine(codeSnippet);
    console.log("Tokenizing:\n" + codeSnippet + "\n\n")
    for (var i = 0; i < lineTokens.tokens.length; i++) {
        var token = lineTokens.tokens[i];
        var text = codeSnippet.substr(token.startIndex, token.endIndex - token.startIndex);
        console.log('    Token "' + text + '" from ' + token.startIndex + ' to ' + token.endIndex + ' with scopes ' + token.scopes);
    }
    console.log("End tokenizing\n");
}

function tokenizeCodeSnippet(codeSnippet, oldGrammarPath, newGrammarPath)
{
    var oldRegistry = new Registry();
    var newRegistry = new Registry();

    console.log("oldGrammarPath: " + oldGrammarPath);
    console.log("newGrammarPath: " + newGrammarPath);

    var oldGrammar = oldRegistry.loadGrammarFromPathSync(oldGrammarPath);
    var newGrammar = newRegistry.loadGrammarFromPathSync(newGrammarPath);

    tokenize(codeSnippet, oldGrammar);
    tokenize(codeSnippet, newGrammar);
}

function compareGrammars(oldGrammarPath, newGrammarPath)
{
    var mdReader = new Parser();
    var mdDoc = mdReader.parse("Bar\n```powershell\nfunction foo() {}\nfunction bar() {}\nclass XXX {}\n```\n\n\nxxx");
    var mdWalker = mdDoc.walker();
    var mdNode;

    while (mdNode = mdWalker.next())
    {
        if (mdNode.node.type == "code_block")
        {
            tokenizeCodeSnippet(mdNode.node.literal, oldGrammarPath, grammarPath);
        }
    }
}

function main()
{
    var path = "./" + gitCommitId + ".tmLanguage";
    var child = exec('git show ' + gitCommitId + ":" + grammarPath + " > " + path, function(err, stdout, stderr) {});
    child.on('close', (code) => {
        compareGrammars(path, grammarPath);
    });
}

main()
daviwil commented 8 years ago

Looks good so far!

gravejester commented 7 years ago

I forgot you were this far along with a working solution @vors :) Did you get any further on this? I have been working on a very similar solution myself, but I'm using a JSON file to describe the tests.. but the markdown approach is of course much easier to author. The only problem I see is that we don't have a way of stating what the correct scopes should be in the markdown file?

This is an example of how I would use a json file to describe the tests:

[
    {
        "line":"Write-Host 'This is a single quoted string'",
        "tokens": [
            {
                 "token":"Write-Host",
                 "scopes": [
                     "source.powershell",
                     "meta.command.powershell",
                     "support.function.powershell"
                 ]
            },
            {
                 "token":"'",
                 "scopes": [
                     "source.powershell",
                     "meta.command.powershell",
                     "string.quoted.single.powershell"
                 ]
            },
            {
                 "token":"This is a single quoted string",
                 "scopes": [
                     "source.powershell",
                     "meta.command.powershell",
                     "string.quoted.single.powershell"
                ]
            },
            {
                 "token":"'",
                 "scopes": [
                     "source.powershell",
                     "meta.command.powershell",
                     "string.quoted.single.powershell"
                ]
            }
        ]
    }
]
vors commented 7 years ago

@gravejester not really, I dropped it without finishing.

I tried to document my reasoning about the desirable test harness here as well as the source code to produce these results.

If you already working on another approach, don't feel obligated to try to incorporate mine. But if you find anything suitable, feel free to reuse it.

gravejester commented 7 years ago

@vors Ok, I will probably steal some of your code, but going for using a json file for defining the tests. This way we are not comparing "this version" with the "last version", but always testing against what we have decided should be "the truth(tm)" :)

Unless someone have some other ideas. It will be a hassle to create all the tests, but once they are created they should rarely change.

On a different note, found a good description of the scope names here: https://www.sublimetext.com/docs/3/scope_naming.html and I suggest we base our naming on this document.

gravejester commented 7 years ago

For anyone interested, I have opted to use YAML instead of JSON for the reference file - makes it's a lot easier to read (and edit) :)

I have a working version running locally now - with a really small subset of a reference file. So now starts the major job of fleshing this out.

omniomi commented 6 years ago

Closing as we've implemented Jasmine tests and https://github.com/kevinastone/atom-grammar-test.

Can create new issues to address changes in the way the tests are written or coverage issues. Cleaning up old issues.