bblfsh / sdk

Babelfish driver SDK
GNU General Public License v3.0
23 stars 27 forks source link

Proposal: remove @token in favor of positional info #318

Open dennwc opened 6 years ago

dennwc commented 6 years ago

Currently, there is a confusion with a @token field in the AST.

On one hand side, it is expected to store a raw string representation of the node, as it's written in the source file.

On the other hand, some drivers assume that @token should store a keyword for composite nodes (for example func in func Foo()).

Also, both assumptions are false in Semantic UAST - nodes will store a distinct field with an unescaped value of the node (for example "\"\n" literal will be stored as " + line break).

This issue cannot be solved in a language agnostic way until we will be able to store multiple trees.

Thus, I propose to drop this concept in favor of fixing positional information for nodes. User will still be able to get the raw node representation by taking a substring from the source file and can still fetch a specific unescaped value by a field name.

Related issues:

juanjux commented 5 years ago

I think this is a case of having something being better than having nothing.

IMO we should define what to do where drivers diverge and fix the ones that do.

dennwc commented 5 years ago

The point is that there is no correct answer to this problem for Semantic UAST.

Token loses the meaning in the Semantic UAST, if we speak about keywords, and it breaks expectations of being a raw code string if we speak about values, since they will be unescaped.