antlr / antlr4

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
http://antlr.org
BSD 3-Clause "New" or "Revised" License
17.27k stars 3.29k forks source link

[TypeScript] Generated files have inconsistent "import" statements, grammars-v4 testing of TypeScript now fails #4491

Closed kaby76 closed 11 months ago

kaby76 commented 11 months ago

This is a possible problem with the TypeScript target. It is best demonstrated by the plsql grammar. This grammar is different than most because it contains a "superClass" option for lexer and parser.

If you generate the target TypeScript code for the grammar (antlr4 -v 4.13.1 -encoding utf-8 -Dlanguage=TypeScript PlSqlLexer.g4; antlr4 -v 4.13.1 -encoding utf-8 -Dlanguage=TypeScript PlSqlParser.g4), you get code with two types of "import" statements, many with curly braces around an imported class:

PlSqlParserListener.ts:import { Sql_scriptContext } from "./PlSqlParser";
PlSqlParserListener.ts:import { Unit_statementContext } from "./PlSqlParser";
PlSqlParserListener.ts:import { Alter_diskgroupContext } from "./PlSqlParser";
PlSqlParserListener.ts:import { Add_disk_clauseContext } from "./PlSqlParser";
PlSqlParserListener.ts:import { Drop_disk_clauseContext } from "./PlSqlParser";
PlSqlParserListener.ts:import { Resize_disk_clauseContext } from "./PlSqlParser";
PlSqlParserListener.ts:import { Replace_disk_clauseContext } from "./PlSqlParser";
PlSqlParserListener.ts:import { Wait_nowaitContext } from "./PlSqlParser";
PlSqlParserListener.ts:import { Rename_disk_clauseContext } from "./PlSqlParser";
PlSqlParserListener.ts:import { Disk_online_clauseContext } from "./PlSqlParser";

and others without curly braces:

PlSqlLexer.ts:import PlSqlLexerBase from './PlSqlLexerBase';
PlSqlParser.ts:import PlSqlParserListener from "./PlSqlParserListener.js";
PlSqlParser.ts:import PlSqlParserBase from './PlSqlParserBase';

My tsconfig.json file is this:

{
  "compilerOptions": {
    "module": "ES2020",
    "moduleResolution": "node",
    "target": "ES6",
    "noImplicitAny": true,
  },
  "ts-node": {
    "esm": true,
    "experimentalSpecifierResolution": "node"
  }
}

This code compiles fine with tsc. However, I cannot execute using node or the ts-node wrapper package because I get import errors:

$ bash run.sh ../hw-examples/alter_operator.sql
Error [ERR_MODULE_NOT_FOUND]: Cannot find module 'C:\msys64\home\Kenne\issues\g4-3890\sql\plsql\Generated-TypeScript-hw\PlSqlLexerBase' imported from C:\msys64\home\Kenne\issues\g4-3890\sql\plsql\Generated-TypeScript-hw\PlSqlLexer.js
Did you mean to import ../PlSqlLexerBase.js?
    at finalizeResolution (node:internal/modules/esm/resolve:255:11)
    at moduleResolve (node:internal/modules/esm/resolve:908:10)
    at defaultResolve (node:internal/modules/esm/resolve:1121:11)
    at nextResolve (node:internal/modules/esm/hooks:865:28)
    at resolve (C:\Users\kenne\AppData\Roaming\npm\node_modules\ts-node\dist\child\child-loader.js:15:125)
    at nextResolve (node:internal/modules/esm/hooks:865:28)
    at Hooks.resolve (node:internal/modules/esm/hooks:303:30)
    at handleMessage (node:internal/modules/esm/worker:196:24)
    at Immediate.checkForMessages [as _onImmediate] (node:internal/modules/esm/worker:138:28)
    at process.processImmediate (node:internal/timers:478:21) {
  code: 'ERR_MODULE_NOT_FOUND',
  url: 'file:///C:/msys64/home/Kenne/issues/g4-3890/sql/plsql/Generated-TypeScript-hw/PlSqlLexerBase'
}

What's disturbing is that I do know 6 days ago Microsoft made a release of the typescript package, and since then, nothing works. I've been trying to rollback the environment but haven't found a fix.

I noticed that the Antlr Tool generates an inconsistent import syntax for the superClass vs the Listener, import PlSqlParserListener from "./PlSqlParserListener.js";, which contains a ".js".

As a hunch, I changed the PlSqlParser.ts and PlSqlLexer.ts files to use an explicit ".js" for the import for the base class, and now it works.

Analysis

According to the most recent node.js documentation, https://nodejs.org/api/esm.html#import-specifiers, a relative specifier must have a suffix. The file extension is always necessary for these. The Antlr tool does not comply with this as the base class specifier is relative but does not have a suffix.

Questions

  1. Why does the Antlr tool generate PlSqlParser.ts with import PlSqlParserListener from "./PlSqlParserListener.js";, which contains ".js", and import PlSqlParserBase from './PlSqlParserBase';, which does not contain the ".js"?

  2. Is there a workaround, besides running a "sed" script to fix the generated files?

ericvergnaud commented 11 months ago

In a A imports B scenario, you have 3 theoretical options:

  1. import { b } from "./B.ts"
  2. import { b } from "./B.js"
  3. import { b } from "./B"

Option 1 would make sense but doesn't work because unfortunately tsc does not convert .ts extensions to .js. This is well-known and painful problem with Typescript.

Option 2 works, but requires B to be compiled beforehand, otherwise some IDEs will complain that B cannot be found, and will not import the definitions (required for edit-time type checking)

Option 3 works in IDEs, and in node if packaged (using webpack or similar), but will fail in edge cases.

My preferred approach is 3 (as you can see in the runtime) and works perfectly with .d.ts files (they never make it to node anyway). All the .js files follow the node spec you mentioned: https://nodejs.org/api/esm.html#import-specifiers. However, the generated lexers and parsers are plain Typescript files, and there is no corresponding spec for Typescript. Since they are not packaged (yet!), they have to follow option 2. This was necessary for the antlr test suite to pass (we don't webpack each unit test).

In another plain Typescript project, I also use option 3 with ts-node, successfully running mocha unit tests as follows:

mocha --loader=ts-node/esm require ts-node/register my_test_file.ts

From the above though, it seems we do have an inconsistency in generated imports i.e. they should follow option 2. That could be worth fixing.

ericvergnaud commented 11 months ago

The reason for having 2 types of imports is because there is only 1 object to export from the generated Listener and Visitor, so that's the default export. It is assumed that the abstract parser is also a default import. But the generated Lexer and Parser must export more than 1 object, hence the need for { .. } style imports.

kaby76 commented 11 months ago

Thanks. That makes sense. Option 3 seems right. Just need to try packaging it up. But of course no idea how to yet. :)

ericvergnaud commented 11 months ago

See for example:

ericvergnaud commented 11 months ago

@kaby76 Fixed, but it will require a tool release (or a local build from dev branch)