Switch to exact versions for tree-sitter parsers

wenkokke commented 2 years ago

The web versions of tree-sitter parsers are unfortunately quite brittle:

There is no testing infrastructure that ships with tree-sitter to test the generated web assembly versions, and as a consequence most parsers do not test the generated web assembly. https://github.com/tree-sitter/tree-sitter/issues/1565
The native and web versions of the tree-sitter libraries use different runtime systems, which do not always behave the same. https://github.com/tree-sitter/tree-sitter-haskell/issues/69
Due to limitations in emscripten, the tree-sitter library must determine which functions from the C standard library are exposed to each individual parser, and the set of functions they have chosen to expose is rather limited. Therefore a benign change which might work perfectly using the native versions of the libraries might completely break the web versions of the libraries. https://github.com/tree-sitter/tree-sitter-haskell/pull/56#issuecomment-1003756899

As a consequence of these two factors, we cannot rely on semantic versioning for web-tree-sitter parsers, and we have to be quite conservative in what versions we allow.

For the short term, I propose that we limit each of the parsers to the exact version which is currently used by the yarn.lock file, be that in exact version number or a commit hash.

For the longer term, I propose that we build a test suite which repeatedly loads up files from several major projects using these supported programming languages and check the generated parse trees to see if (1) they are free from errors, and (2) they correspond to our golden standard files (once we have those). We can then use this test suite to guide in version bumps.

pokey commented 2 years ago

All sounds good! Well captured. One thought as well is that we may want to add a step in CI that runs cursorless test suite

Alternately, we could fold parse tree into cursorless for purposes of CI testing, but then publish both extensions from CI deploy

pokey commented 1 year ago

@wenkokke do we need to pin to sha's within package.json? It would be simpler to use main / master in package.json, and then rely on yarn.lock / package-lock.json to capture exact sha's. Would make bumping much easier; eg could even let Renovate bot lockfile maintenance handle it once we migrate this repo into Cursorless monorepo so that we get CI

wenkokke commented 1 year ago

If there’s no tags, then it’s probably for the best to have the option for consistency when updating lockfiles?

pokey commented 1 year ago

@wenkokke sorry I'm not sure I understand. Could you possibly elaborate?

cursorless-dev / vscode-parse-tree

Switch to exact versions for tree-sitter parsers #22