antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.15k stars 3.7k forks source link

[PL/SQL] basic example as described by the documentation reports a TypeScript type error #3647

Closed doberkofler closed 1 year ago

doberkofler commented 1 year ago

When trying to use the PlSql in a basic example as described by the documentation TypeScript reports a type error when using the gramma specific PlSqlLexer and PlSqlParser.

This is how the grammar is generated:

# clean
rm -rf grammar
mkdir grammar

# download gramma
curl --output grammar/PlSqlLexer.g4 https://raw.githubusercontent.com/antlr/grammars-v4/master/sql/plsql/PlSqlLexer.g4
curl --output grammar/PlSqlParser.g4 https://raw.githubusercontent.com/antlr/grammars-v4/master/sql/plsql/PlSqlParser.g4

### convert gramma for Typescript
antlr4 -Dlanguage=TypeScript grammar/PlSqlLexer.g4
antlr4 -Dlanguage=TypeScript grammar/PlSqlParser.g4

This is the example code:

import antlr4 from 'antlr4';
import PlSqlLexer from './grammar/PlSqlLexer';
import PlSqlParser from './grammar/PlSqlParser';

const input = 'select * from dual';
const chars = new antlr4.CharStream(input);
const lexer = new PlSqlLexer(chars);
const tokens = new antlr4.CommonTokenStream(lexer); /* type error 1 */
const parser = new PlSqlParser(tokens);
const tree = parser.sql_script();
console.log(tree.toStringTree(null, parser)); /* type error 2 */

type error 1:

Argument of type 'PlSqlLexer' is not assignable to parameter of type 'Lexer'. Type 'PlSqlLexer' is missing the following properties from type 'Lexer': _input, _interp, text, line, and 18 more.

type error 2:

Argument of type 'PlSqlParser' is not assignable to parameter of type 'Parser'. Type 'PlSqlParser' is missing the following properties from type 'Parser': _input, _ctx, _interp, _errHandler, and 25 more.

bkiers commented 1 year ago

When you look into the grammars, you will see that both the lexer- and parser grammars have defined a superClass in their options block. That means you should also be using the TS base lexer- and parser classes: https://github.com/antlr/grammars-v4/tree/master/sql/plsql/TypeScript

doberkofler commented 1 year ago

@bkiers Thank you for your feedback. I agree that it should work, but unfortunately it does not seem to work when following the documentation. By looking at the source, I now discovered that the imports are from a package antlr4ts and not antlr4 as suggested by the antlr4 TypeScript documentation. There is an antlr4ts package on npm but it does not seem to be maintained and is not what the official documentation describes.

doberkofler commented 1 year ago

When using the antlr4ts package I get different type errors:

import antlr4ts from 'antlr4ts';
import PlSqlLexer from './grammar/PlSqlLexer';
import PlSqlParser from './grammar/PlSqlParser';

const input = 'select * from dual';
const inputStream = antlr4ts.CharStreams.fromString(input);
const lexer = new PlSqlLexer(inputStream);
const tokenStream = new antlr4ts.CommonTokenStream(lexer);
const parser = new PlSqlParser(tokenStream);
const tree = parser.sql_script();
console.log(tree.toStringTree(null, parser));

Reports:

image

kaby76 commented 1 year ago

All grammars in grammars-v4 have a "desc.xml". This file describes what ports work for a grammar, the entry point (if it cannot be found by analyzing the grammar), how to test the grammar, etc. This is analogous to the pom.xml, but for all targets, not just the Java target.

The desc.xml only lists CSharp and Java. https://github.com/antlr/grammars-v4/blob/eba5c4487067292a82cf6b753f33a6b546c7e1df/sql/plsql/desc.xml#L4

The plsql/ grammar doesn't work for TypeScript. There may be code for a port, but that never means that the port actually "works". A script, sets up the initial values for the desc.xml, which I did earlier in 2023.

I did a "grep grammars-v4 -R -i -e antlr4ts" a few weeks ago to note that plsql/ still uses the old Tunnelvision Antlr fork for TS. https://github.com/antlr/grammars-v4/issues/3621.

I will do the port now.

kaby76 commented 1 year ago

The JS port is atrociously slow--times out at 300 seconds for "long-running/" examples. BTW, I added these timeouts so the builds don't hang the system, chew up money and resources, and force the Github Action to be killed. That's part of the reason why a port isn't listed as "working" in the desc.xml.

The TS port is a wrapper around the JS port. Here's how the JS port worked on a few examples in "long-running/".

$ bash run.sh long-running/
aggregate01.sql        cast_multiset07.sql    datetime02.sql         order_by07.sql         query_factoring07.sql
08/07-07:10:58 ~/issues/g4-3647/sql/plsql/Generated-JavaScript
$ bash run.sh long-running/*.sql
JavaScript 0 long-running/aggregate01.sql success 107.64
JavaScript 1 long-running/cast_multiset07.sql success 123.18
JavaScript 2 long-running/datetime02.sql success 102.439
JavaScript 3 long-running/order_by07.sql success 133.974
JavaScript 4 long-running/query_factoring07.sql success 178.287
Total Time: 645.549
08/07-07:22:04 ~/issues/g4-3647/sql/plsql/Generated-JavaScript

I'll redo the tests so we parse puny, "Hello World!" examples for these ports. At a later point, I'll have to go through and fix the grammar so that it can parse examples bigger than the "Hello World!" equivalent for plsql.

kaby76 commented 1 year ago

Here's the planned "Hello World" examples for the JS port.

08/07-07:33:02 ~/issues/g4-3647/sql/plsql/Generated-JavaScript-hw
$ bash run.sh hw-examples/*.sql
JavaScript 0 hw-examples/alter_operator.sql success 0.302
JavaScript 1 hw-examples/alter_outline.sql success 0.049
JavaScript 2 hw-examples/drop_operator.sql success 0.018
JavaScript 3 hw-examples/keywordasidentifier02.sql success 14.553
JavaScript 4 hw-examples/lexer01.sql success 2.49
JavaScript 5 hw-examples/lexer02.sql success 0.751
JavaScript 6 hw-examples/lexer03.sql success 7.46
JavaScript 7 hw-examples/lexer04.sql success 13.322
JavaScript 8 hw-examples/lexer05.sql success 1.335
JavaScript 9 hw-examples/simple11.sql success 0.397
JavaScript 10 hw-examples/truncate_table.sql success 0.011
Total Time: 40.704
08/07-07:34:03 ~/issues/g4-3647/sql/plsql/Generated-JavaScript-hw
doberkofler commented 1 year ago

Do I understand you correctly:

1) Officially he PlSql grammar currently only supports C# and Java? 2) antlr4 itself would natively support TypeScript for the PlSql grammar has not yet been ported? 3) The PlSql grammar should work with the antlr4ts library?

I have also already tried using the antlr4ts library but still get type errors. Would you have a working example on how to use the PlSql grammar with the antlr4ts library?

Is there any documentation explaining all of this that I missed?

kaby76 commented 1 year ago

All grammars--such as this grammar, sql/plsql/--that are split and have target-specific code have to be in "target agnostic format". That's described here. I'm hoping that the proposed "string template actions" PR is adopted, though.

doberkofler commented 1 year ago

Unfortunately I just started having a look at this project and do not understand the implementation details. I basically just wanted to have a first look at the PL/SQL parser to evaluate if it could be used as the starting point for some major automated code refactoring in a large PLZ/SQL code base. TypeScript is one of the language I currently use the most and thats what I wanted to start with.

Could you please just confirm that I understood you correctly:

1) Officially he PlSql grammar currently only supports C# and Java? 2) antlr4 itself would natively support TypeScript for the PlSql grammar has not yet been ported? 3) The PlSql grammar should work with the antlr4ts library but they also do no longer?

kaby76 commented 1 year ago
  1. Officially he PlSql grammar currently only supports C# and Java?

It currently only supports CSharp (the "official" name of the C# port) and Java. But, I have a PR that fixes the JavaScript and TypeScript ports. https://github.com/antlr/grammars-v4/pull/3650

  1. antlr4 itself would natively support TypeScript for the PlSql grammar has not yet been ported?

Antlr version 4.12.0 supports TypeScript, but at it stands at this second, it does not work for TypeScript, nor JavaScript. The PR I'm finishing up fixes this. I have to bump up the "" in the desc.xml to note to the testing scripts and to people that at least 4.12.0 is required. We test using 4.13.0 for grammars-v4.

  1. The PlSql grammar should work with the antlr4ts library but they also do no longer?

It probably does still "work", but "work" is a relative term. The generated parser is really slow, and I doubt that it would be any faster than the "official" Antlr Typescript port. Also, we just don't support the TunnelVision port for TypeScript. We don't have time to support all the different targets out there. I can only work so much, and have time only for the "official" targets.

The plan is to work on optimizing the grammar on a few of the more egregious problems in a day or two.

doberkofler commented 1 year ago

I understand. Thank you for all the good work.

doberkofler commented 1 year ago

Thank you for the commit, but unfortunately I'm still not able to get my basic TypeScript example to work. I have downloaded the new grammars and the TypeScript support files with the following script:

#!/bin/sh

# clean
rm -rf grammar
mkdir grammar

# download grammar
curl --output grammar/PlSqlLexer.g4 https://raw.githubusercontent.com/antlr/grammars-v4/master/sql/plsql/PlSqlLexer.g4
curl --output grammar/PlSqlParser.g4 https://raw.githubusercontent.com/antlr/grammars-v4/master/sql/plsql/PlSqlParser.g4

### convert grammar for Typescript
antlr4 -Dlanguage=TypeScript grammar/PlSqlLexer.g4
antlr4 -Dlanguage=TypeScript grammar/PlSqlParser.g4

# download Typescript support files
curl --output grammar/PlSqlLexerBase.ts https://github.com/antlr/grammars-v4/blob/master/sql/plsql/TypeScript/PlSqlLexerBase.ts
curl --output grammar/PlSqlParserBase.ts https://github.com/antlr/grammars-v4/blob/master/sql/plsql/TypeScript/PlSqlParserBase.ts

The TypeScript test I'm using looks like this:

import antlr4 from 'antlr4';
import PlSqlLexer from './grammar/PlSqlLexer';
import PlSqlParser from './grammar/PlSqlParser';

const input = 'select * from dual';
const chars = new antlr4.CharStream(input);
const lexer = new PlSqlLexer(chars);
const tokens = new antlr4.CommonTokenStream(lexer); // ERROR: Argument of type 'PlSqlLexer' is not assignable to parameter of type 'Lexer'.
const parser = new PlSqlParser(tokens);
const tree = parser.sql_script();
console.log(tree.toStringTree(null, parser)); // ERROR: Argument of type 'PlSqlParser' is not assignable to parameter of type 'Parser'.

Unfortunately TypeScript still complains about type errors when using the lexer and the parser object:

Argument of type 'PlSqlLexer' is not assignable to parameter of type 'Lexer'.
  Type 'PlSqlLexer' is missing the following properties from type 'Lexer': _input, _interp, text, line, and 18 more.
Argument of type 'PlSqlParser' is not assignable to parameter of type 'Parser'.
  Type 'PlSqlParser' is missing the following properties from type 'Parser': _input, _ctx, _interp, _errHandler, and 25 more.
kaby76 commented 1 year ago

Sorry. The instructions for the TypeScript target are insufficient because you have to have both a package.json file and a tsconfig.json file. Otherwise, nothing will compile.

Here is a generated driver for Linux for the grammar. Please have a look at what I do, which is what is generated for this repo for Github Actions testing. driver.tar.gz. You might also take a look at the Github Actions build environment to see what versions of Node and TypeScript are used. https://github.com/antlr/grammars-v4/blob/master/.github/workflows/main.yml#L115-L212

Here is what the build files look like:

ken@DESKTOP-DL44R7B:~/grammars-v4/sql/plsql/Generated-TypeScript-hw$ cat tsconfig.json
{
  "compilerOptions": {
    "module": "ES2020",
    "moduleResolution": "node",
    "target": "ES6",
    "noImplicitAny": true,
  },
  "ts-node": {
    "esm": true,
    "experimentalSpecifierResolution": "node"
  }
}
ken@DESKTOP-DL44R7B:~/grammars-v4/sql/plsql/Generated-TypeScript-hw$ cat package.json
{
  "name": "i",
  "version": "1.0.0",
  "description": "",
  "main": "Test.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "",
  "license": "ISC",
  "dependencies": {
    "antlr4": "4.13.0",
    "buffer": "^6.0.3",
    "fs-extra": "^11.1.1",
    "timer-node": "^5.0.6",
    "typescript-string-operations": "^1.5.0"
  },
  "type": "module",
  "devDependencies": {
    "@types/node": "^20.2.5"
  },
  "scripts": {
    "build": "tsc -p tsconfig.json"
  }
}
ken@DESKTOP-DL44R7B:~/grammars-v4/sql/plsql/Generated-TypeScript-hw$

For comparison, here is my driver program, Test.ts:

ken@DESKTOP-DL44R7B:~/grammars-v4/sql/plsql/Generated-TypeScript-hw$ cat Test.ts
// Generated from trgen 0.21.0

import { CharStream } from 'antlr4';
import { CharStreams } from 'antlr4';
import { CommonTokenStream } from 'antlr4';
import { ErrorListener } from 'antlr4';
import { InputStream } from 'antlr4';
import { Recognizer } from 'antlr4';
import { RecognitionException } from 'antlr4';
import { Token } from 'antlr4';
import { readFileSync } from 'fs';
import { writeFileSync } from 'fs';
import { openSync } from 'fs';
import { readSync } from 'fs';
import { writeSync } from 'fs';
import { closeSync } from 'fs';
import { readFile } from 'fs/promises'

import PlSqlLexer from './PlSqlLexer';
import PlSqlParser from './PlSqlParser';

import { StringBuilder, emptyString, joinString, formatString, isNullOrWhiteSpace } from 'typescript-string-operations';
import { Timer, Time, TimerOptions } from 'timer-node';

function getChar() {
    let buffer = Buffer.alloc(1);
    var xx = 0;
    try {
        xx = readSync(0, buffer, 0, 1, null);
    } catch (err) {
    }
    if (xx === 0) {
        return '';
    }
    return buffer.toString('utf8');
}

class MyErrorListener<T> extends ErrorListener<T> {
    _quiet: boolean;
    _tee: boolean;
    _output: any;
    had_error: boolean;

    constructor(quiet: boolean, tee: boolean, output: any) {
        super();
        this._quiet = quiet;
        this._tee = tee;
        this._output = output;
        this.had_error = false;
    }

    syntaxError(recognizer: Recognizer<T>, offendingSymbol: T, line: number, column: number, msg: string, e: RecognitionException | undefined): void {
        this.had_error = true;
        if (this._tee) {
            writeSync(this._output, `line ${line}:${column} ${msg}\n`);
        }
        if (!this._quiet) {
            console.error(`line ${line}:${column} ${msg}`);
        }
    }
}

var tee = false;
var show_profile = false;
var show_tree = false;
var show_tokens = false;
var show_trace = false;
var error_code = 0;
var quiet = false;
var encoding = 'utf8';
var string_instance = 0;
var prefix = '';
var inputs: string[] = [];
var is_fns: boolean[] = [];

function splitLines(t: string) { return t.split(/\r\n|\r|\n/); }

function main() {
    for (let i = 2; i <process.argv.length; ++i)
    {
        switch (process.argv[i]) {
            case '-tokens':
                show_tokens = true;
                break;
            case '-tree':
                show_tree = true;
                break;
            case '-prefix':
                prefix = process.argv[++i] + ' ';
                break;
            case '-input':
                inputs.push(process.argv[++i]);
                is_fns.push(false);
                break;
            case '-tee':
                tee = true;
                break;
            case '-encoding':
                encoding = process.argv[++i];
                break;
            case '-x':
                var sb = new StringBuilder();
                var ch;
                while ((ch = getChar()) != '') {
                    sb.Append(ch);
                }
                var input = sb.ToString();
                var sp = splitLines(input);
                for (var ii of sp) {
                    if (ii == '') continue;
                    inputs.push(ii);
                    is_fns.push(true);
                }
                break;
            case '-q':
                quiet = true;
                break;
            case '-trace':
                show_trace = true;
                break;
            default:
                inputs.push(process.argv[i]);
                is_fns.push(true);
                break;
        }
    }
    if (inputs.length == 0) {
        ParseStdin();
    }
    else {
        const timer = new Timer({ label: 'test-timer' });
        timer.start();
        for (var f = 0; f <inputs.length; ++f)
        {
            if (is_fns[f])
                ParseFilename(inputs[f], f);
            else
                ParseString(inputs[f], f);
        }
        timer.stop();
        var t = timer.time().m * 60 + timer.time().s + timer.time().ms / 1000;
        if (!quiet) console.error('Total Time: ' + t);
    }
    process.exitCode = error_code;
}

function ParseStdin() {
    var sb = new StringBuilder();
    var ch;
    while ((ch = getChar()) != '') {
        sb.Append(ch);
    }
    var input = sb.ToString();
    var str = CharStreams.fromString(input);
    DoParse(str, "stdin", 0);
}

function ParseString(input: string, row_number: number) {
    var str = CharStreams.fromString(input);
    DoParse(str, "string" + string_instance++, row_number);
}

function ParseFilename(input: string, row_number: number) {
    var str = CharStreams.fromPathSync(input, encoding);
    DoParse(str, input, row_number);
}

function DoParse(str: CharStream, input_name: string, row_number: number) {
    const lexer = new PlSqlLexer(str);
    const tokens = new CommonTokenStream(lexer);
    const parser = new PlSqlParser(tokens);
    lexer.removeErrorListeners();
    parser.removeErrorListeners();
    var output = tee ? openSync(input_name + ".errors", 'w') : 1;
    var listener_parser = new MyErrorListener(quiet, tee, output);
    var listener_lexer = new MyErrorListener(quiet, tee, output);
    parser.addErrorListener(listener_parser);
    lexer.addErrorListener(listener_lexer);
    if (show_tokens) {
        for (var i = 0; ; ++i) {
            var ro_token = lexer.nextToken();
            var token = ro_token;
            token.tokenIndex = i;
            console.error(token.toString());
            if (token.type === Token.EOF)
                break;
        }
//        lexer.reset();
    }
    if (show_trace) {
//       parser._interp.trace_atn_sim = true;
    }
    const timer = new Timer({ label: 'test-timer2' });
    timer.start();
    const tree = parser.sql_script();
    timer.stop();
    var result = "";
    if (listener_parser.had_error || listener_lexer.had_error) {
        result = 'fail';
        error_code = 1;
    }
    else {
        result = 'success';
    }
    var t = timer.time().m * 60 + timer.time().s + timer.time().ms / 1000;
    if (show_tree) {
        if (tee) {
            writeFileSync(input_name + ".tree", tree.toStringTree(parser.ruleNames, parser));
        } else {
            console.error(tree.toStringTree(parser.ruleNames, parser));
        }
    }
    if (!quiet) {
        console.error(prefix + 'TypeScript ' + row_number + ' ' + input_name + ' ' + result + ' ' + t);
    }
    if (tee) {
        closeSync(output);
    }
}

main()
doberkofler commented 1 year ago

I did use a tsconfig.json and package.json file in my test as it would not be possible to compile TypeScript without them. Unfortunately I made an error when updating PlSqlLexerBase.ts and PlSqlParserBase.ts and this was the problem. Thank you.