Chevrotain / chevrotain

Parser Building Toolkit for JavaScript
https://chevrotain.io
Apache License 2.0
2.49k stars 204 forks source link

Remove support for Tokens defined as classes. #653

Closed bd82 closed 6 years ago

bd82 commented 6 years ago
bd82 commented 6 years ago

See #660

bd82 commented 6 years ago

This can probably be accomplished as a soft deprecation. Just remove any related examples as the code base can still handle TokenTypes as classes. Perhaps add a runtime check (That can be disabled) so end users will be aware the functionality will eventually break as it is no longer an official API?

ghost commented 6 years ago

It should be noted that this change comes with some added noise/redundancy.

Instead of inferring the name property from the class, you must now specify a separate name property to achieve the same, which is confusing and much longer:

const Identifier = createToken({
      ^^^^^^^^^^
        name: "Identifier",
               ^^^^^^^^^^              
        pattern: /[a-zA-Z_]\w+/
});

as opposed to:

class Identifier {
        static pattern = /[a-zA-Z_]\w+/;
}

which is much cleaner.

Can't say I'm very fond of this change. 😞

bd82 commented 6 years ago

It should be noted that this change comes with some added noise/redundancy.

That is not exactly correct, just not the responsibility of the Chevrotain library. See code snippet below how to avoid the duplication.


const orgCreateToken = require("chevrotain").createToken

const tokenVocabulary  = {}
function createToken(config) {
   const newToken = orgCreateToken(config)
   tokenVocabulary[config.name] = newToken
   return newToken
}

createToken({name: "A", pattern:/a/})
createToken({name: "B", pattern:/b/})

// to make things concise...
const t = tokenVocabulary

// in parsing rule
$.CONSUME(t.A)
bd82 commented 6 years ago

Also if you are fond of the class syntax you can keep using it to define the TokensTypes even after they will be hard deprecated.

All you need to do is process your classes (functions with static properties) and invoke the appropriate createToken calls. and run a processor on those classes to invoke the createToken API with the right arguments.

Something like

class MyTokenClass {
   static PATTERN = /abc/
}

const MyToken = createToken({
   // Function.name property won't work on IE11
   name: MyTokenClass.name, 
   patttern: MyTokenClass.PATTERN
})

Just do it automatically, maybe by automatic manipulation of the module exports object?

ghost commented 6 years ago

@bd82 Thanks for the examples on how to avoid the redundancy, that was really helpful. 👍

And yes, I guess users could transform classes themselves to call the createToken API, though the problem at hand isn't avoided.

This change invites users to build APIs around createToken to reduce the implied redundancy. Personally, I think the usability/redundancy with proper class syntax was in a pretty good state and didn't require much boilerplate or transformers.

If I were to avoid repeated calls to createToken, I would store a lot of objects in an array an call createToken on each of them, which would create tokens that I could store in a vocabulary.

Is a "global vocabulary" that chevrotain automatically creates a bad idea?

bd82 commented 6 years ago

Is a "global vocabulary" that chevrotain automatically creates a bad idea?

Yes because there could be multiple vocabularies, for example two parsers of different languages implemented using Chevrotain and running under the same process. Or multiple versions of the same Parser with slightly different sets of tokens (Java5 vs Java 8)

I think the usability/redundancy with proper class syntax was in a pretty good state and didn't require much boilerplate or transformers.

It was also a trap causing wrong assumptions, e.g: are tokens instances of a TokenType? In fact it was a relic from the time Chevrotain really did support tokens using prototype inheritance. And required certain hacks during lexer/parser initialization to ensure that it kept working...

As Chevrotain becomes more and more generic and gains more capabilities (e.g: custom APIs) the existing supported APIs must be kept as simple as possible even at the cost of breaking changes, the alternative is reduced development momentum.

I think the example grammars may benefit from introducing the pattern described above, so end users will be exposed to it yet it, but it would not be part of the library...