Closed daviwil closed 6 years ago
A while ago @Jaykul proposed use https://github-lightshow.herokuapp.com/ for testing. I think it's the right way to go about it. It's fairly editor agnostic.
I just found this GitHub repo: https://github.com/microsoft/vscode-textmate
Looks like we can use VS Code's TextMate parser as an API, might be helpful for testing the results of using our TextMate grammar against test files!
The VS Code team uses some custom code to test the syntax definitions they ship (including the one for PowerShell). They use a well-known example code file for each language and then use their syntax tokenizer to output a JSON file which is then compared against a JSON file from a "known good" tokenization pass. Simple approach but it seems to work for them. We might be able to use this as a starting point for our own CI tests.
Here are some links to the relevant files:
Looks like the shared syntax definition repo for TypeScript also has a similar solution and it's also using the vscode-textmate library:
https://github.com/Microsoft/TypeScript-TmLanguage/tree/master/tests
As a heads-up, I'm planning to spend some time this and next week working on this tests automation. Please, feel free to use https://gitter.im/PowerShell/EditorSyntax if you have some ideas to share about it.
TypeScript is a good example!
I don't particularly like an idea of storing serialized regions. I would probably prefer run current grammar vs previous release (tag or just commit sha1) grammar and if it gives the same results on the tests included in the previous release, then CI passes. That way
I like the way how https://github.com/jgm/CommonMark/blob/master/spec.txt works. It's the single source for http://spec.commonmark.org/0.25/ and for tests https://github.com/jgm/CommonMark/blob/master/test/spec_tests.py All in one single file. Very convinient imo!
In SublimeText/PowerShell we have a single file, but it's a .ps1 file. https://github.com/SublimeText/PowerShell/blob/dev/tests/samples/test-file.ps1
It forces everybody to put comments as powershell comments, which interfere with the highlighting. For example, if something is broken in the middle of document and the rest of the doc is treated as a string, then test just show this fact and loose all other information. It's kind of old compilers that reports only one error at a time. Not convenient.
If we take a similar to CommonMark approach, we can use a spec document with some loose structure, that allows to identify test-case regions. For simplicity and popularity I would say a markdown document.
It could look like that
## Basics
declarations should be consistent for functions
```powershell
function foo.bar() {}
Function foo() {}
And classes
class A {}
Class Foo-Bar {}
And workflows
workflow w1 {}
Workflow work {}
And configurations
configuration c {}
Configuration c {}
Some explanation about test.
[int[]][char[]]"hello world"
[string]$someVariable = [char[]](104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)
[Collections.Generic.List``2[char]]$x = [char[]](104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)
[Collections.Generic.List[char]]$x = [char[]](104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)
Every
foobar
block would be treated as a separate test case.
This way, it's easy to read test cases subsequently (because they are listed subsequently) and get the context and intentions of them from the comments above and below.
Feedback is welcome! Please let me know if you see any problems with this approach
So for the CommonMark approach, do they just have a "last known good" HTML file that they're comparing the Markdown formatting against? If so, that sounds fine to me. I like having a document serve a dual purpose to be both an example and a test artifact.
No, they explicitly say what rendered view should look like, i.e. https://github.com/jgm/CommonMark/blob/master/spec.txt#L531
```````````````````````````````` example
***
---
___
.
<hr />
<hr />
<hr />
Ahhh! Clever, I missed that.
I started a node.js - based prototype for test harness that uses https://github.com/Microsoft/vscode-textmate
Opened a couple of issues: https://github.com/Microsoft/vscode-textmate/issues/20 https://github.com/Microsoft/vscode-textmate/issues/21
I'm comparing old grammar from commit 0cabc46e3a40ce8d300403107b08a70708321ca6 to the current master.
Example for parsing is
function foo() {}
function bar() {}
class XXX {}
Here is a current output
Token "function" from 0 to 8 with scopes source.powershell,meta.function,storage.type
Token " " from 8 to 9 with scopes source.powershell,meta.function
Token "foo" from 9 to 12 with scopes source.powershell,meta.function,entity.name.function.powershell
Token "(" from 12 to 13 with scopes source.powershell
Token ") {}
" from 13 to 18 with scopes source.powershell
Token "function" from 18 to 26 with scopes source.powershell,meta.function,storage.type
Token " " from 26 to 27 with scopes source.powershell,meta.function
Token "bar" from 27 to 30 with scopes source.powershell,meta.function,entity.name.function.powershell
Token "(" from 30 to 31 with scopes source.powershell
Token ") {}
" from 31 to 36 with scopes source.powershell
Token "class" from 36 to 41 with scopes source.powershell,storage.type.powershell
Token " " from 41 to 42 with scopes source.powershell
Token "XXX" from 42 to 45 with scopes source.powershell,entity.name.function
Token " {}
" from 45 to 50 with scopes source.powershell
Token "function" from 0 to 8 with scopes source.powershell,meta.function,storage.type
Token " " from 8 to 9 with scopes source.powershell,meta.function
Token "foo" from 9 to 12 with scopes source.powershell,meta.function,entity.name.function.powershell
Token "(" from 12 to 13 with scopes source.powershell
Token ") " from 13 to 15 with scopes source.powershell
Token "{" from 15 to 16 with scopes source.powershell,meta.scriptblock.powershell
Token "}" from 16 to 17 with scopes source.powershell,meta.scriptblock.powershell
Token "
" from 17 to 18 with scopes source.powershell
Token "function" from 18 to 26 with scopes source.powershell,meta.function,storage.type
Token " " from 26 to 27 with scopes source.powershell,meta.function
Token "bar" from 27 to 30 with scopes source.powershell,meta.function,entity.name.function.powershell
Token "(" from 30 to 31 with scopes source.powershell
Token ") " from 31 to 33 with scopes source.powershell
Token "{" from 33 to 34 with scopes source.powershell,meta.scriptblock.powershell
Token "}" from 34 to 35 with scopes source.powershell,meta.scriptblock.powershell
Token "
" from 35 to 36 with scopes source.powershell
Token "class" from 36 to 41 with scopes source.powershell,meta.class.powershell,storage.type.powershell
Token " " from 41 to 42 with scopes source.powershell,meta.class.powershell
Token "XXX" from 42 to 45 with scopes source.powershell,meta.class.powershell,entity.name.function.powershell
Token " {" from 45 to 47 with scopes source.powershell,meta.class.powershell
Token "}" from 47 to 48 with scopes source.powershell,meta.class.powershell
Token "
" from 48 to 50 with scopes source.powershell
As you can see they are quite different already, even on this small example.
Here is the code that used to produce it (totally node.js newbie)
var exec = require('child_process').exec;
var Parser = require('commonmark').Parser;
var Registry = require('vscode-textmate').Registry;
const gitCommitId = "0cabc46e3a40ce8d300403107b08a70708321ca6";
const grammarPath = "../PowerShellSyntax.tmLanguage";
function tokenize(codeSnippet, grammar)
{
var lineTokens = grammar.tokenizeLine(codeSnippet);
console.log("Tokenizing:\n" + codeSnippet + "\n\n")
for (var i = 0; i < lineTokens.tokens.length; i++) {
var token = lineTokens.tokens[i];
var text = codeSnippet.substr(token.startIndex, token.endIndex - token.startIndex);
console.log(' Token "' + text + '" from ' + token.startIndex + ' to ' + token.endIndex + ' with scopes ' + token.scopes);
}
console.log("End tokenizing\n");
}
function tokenizeCodeSnippet(codeSnippet, oldGrammarPath, newGrammarPath)
{
var oldRegistry = new Registry();
var newRegistry = new Registry();
console.log("oldGrammarPath: " + oldGrammarPath);
console.log("newGrammarPath: " + newGrammarPath);
var oldGrammar = oldRegistry.loadGrammarFromPathSync(oldGrammarPath);
var newGrammar = newRegistry.loadGrammarFromPathSync(newGrammarPath);
tokenize(codeSnippet, oldGrammar);
tokenize(codeSnippet, newGrammar);
}
function compareGrammars(oldGrammarPath, newGrammarPath)
{
var mdReader = new Parser();
var mdDoc = mdReader.parse("Bar\n```powershell\nfunction foo() {}\nfunction bar() {}\nclass XXX {}\n```\n\n\nxxx");
var mdWalker = mdDoc.walker();
var mdNode;
while (mdNode = mdWalker.next())
{
if (mdNode.node.type == "code_block")
{
tokenizeCodeSnippet(mdNode.node.literal, oldGrammarPath, grammarPath);
}
}
}
function main()
{
var path = "./" + gitCommitId + ".tmLanguage";
var child = exec('git show ' + gitCommitId + ":" + grammarPath + " > " + path, function(err, stdout, stderr) {});
child.on('close', (code) => {
compareGrammars(path, grammarPath);
});
}
main()
Looks good so far!
I forgot you were this far along with a working solution @vors :) Did you get any further on this? I have been working on a very similar solution myself, but I'm using a JSON file to describe the tests.. but the markdown approach is of course much easier to author. The only problem I see is that we don't have a way of stating what the correct scopes should be in the markdown file?
This is an example of how I would use a json file to describe the tests:
[
{
"line":"Write-Host 'This is a single quoted string'",
"tokens": [
{
"token":"Write-Host",
"scopes": [
"source.powershell",
"meta.command.powershell",
"support.function.powershell"
]
},
{
"token":"'",
"scopes": [
"source.powershell",
"meta.command.powershell",
"string.quoted.single.powershell"
]
},
{
"token":"This is a single quoted string",
"scopes": [
"source.powershell",
"meta.command.powershell",
"string.quoted.single.powershell"
]
},
{
"token":"'",
"scopes": [
"source.powershell",
"meta.command.powershell",
"string.quoted.single.powershell"
]
}
]
}
]
@gravejester not really, I dropped it without finishing.
I tried to document my reasoning about the desirable test harness here as well as the source code to produce these results.
If you already working on another approach, don't feel obligated to try to incorporate mine. But if you find anything suitable, feel free to reuse it.
@vors Ok, I will probably steal some of your code, but going for using a json file for defining the tests. This way we are not comparing "this version" with the "last version", but always testing against what we have decided should be "the truth(tm)" :)
Unless someone have some other ideas. It will be a hassle to create all the tests, but once they are created they should rarely change.
On a different note, found a good description of the scope names here: https://www.sublimetext.com/docs/3/scope_naming.html and I suggest we base our naming on this document.
For anyone interested, I have opted to use YAML instead of JSON for the reference file - makes it's a lot easier to read (and edit) :)
I have a working version running locally now - with a really small subset of a reference file. So now starts the major job of fleshing this out.
Closing as we've implemented Jasmine tests and https://github.com/kevinastone/atom-grammar-test.
Can create new issues to address changes in the way the tests are written or coverage issues. Cleaning up old issues.
We need to find a way to test the syntax definition to ensure that any new changes don't break the grammar. It would be ideal if this could be done with the fewest number of dependencies possible so that we could run our tests in AppVeyor when PRs are sent.
I believe @vors might have some initial idea for how we could do that.