chipsalliance / verible

Verible is a suite of SystemVerilog developer tools, including a parser, style-linter, formatter and language server
https://chipsalliance.github.io/verible/
Other
1.27k stars 198 forks source link

Alternative: syntax-tree-free formatting #1210

Open fangism opened 2 years ago

fangism commented 2 years ago

Today, the formatter is limited to operating on syntax that the parser supports, which is only a subset relative to all possible uses (and abuses) of preprocessing directives and macros. Basing a formatter on a syntax-tree was the initial approach taken to get to a minimal working product quickly with limited development resources. However, our use of the TokenPartitionTree was a conscious design choice to allow for the possibility of using alternative parsing methods. Prior art: clang-format does not parse source code in the traditional sense; it uses the result of the lexer, i.e. tokens. The job of an alternative parser is to build a TokenPartitionTree, thereby allowing complete re-use of all the algorithmic formatting code, expressing grouping and indentation rules.

Benefits:

Costs:

mglb commented 2 years ago

I thought about incrementally moving in a direction of "syntax-tree-assisted" formatting - i.e. doing formatting directly from token list like you said, but still querying token's syntax tree context when needed/suitable. The context queries would make conversion of current code easier.

Later we can make the syntax tree context optional (again, incrementally).

Rough example of an incremental change I thought about: SetIndentationsAndCreatePartitions could call (hypothetical) CreatePartitionsForTokensRange(tokens_range_of_a_subtree) for kSystemTFCall case. CreatePartitionsForTokensRange() would go over identifier, (, comments, etc. For argument tokens it would fallback to tree-based partitioning (TreeUnwrapper::Visit(const SyntaxTreeNode& node)), then continue with ,/comments/), and so on.