Open fangism opened 2 years ago
I thought about incrementally moving in a direction of "syntax-tree-assisted" formatting - i.e. doing formatting directly from token list like you said, but still querying token's syntax tree context when needed/suitable. The context queries would make conversion of current code easier.
Later we can make the syntax tree context optional (again, incrementally).
Rough example of an incremental change I thought about:
SetIndentationsAndCreatePartitions
could call (hypothetical) CreatePartitionsForTokensRange(tokens_range_of_a_subtree)
for kSystemTFCall
case. CreatePartitionsForTokensRange()
would go over identifier, (
, comments, etc. For argument tokens it would fallback to tree-based partitioning (TreeUnwrapper::Visit(const SyntaxTreeNode& node)
), then continue with ,
/comments/)
, and so on.
Today, the formatter is limited to operating on syntax that the parser supports, which is only a subset relative to all possible uses (and abuses) of preprocessing directives and macros. Basing a formatter on a syntax-tree was the initial approach taken to get to a minimal working product quickly with limited development resources. However, our use of the
TokenPartitionTree
was a conscious design choice to allow for the possibility of using alternative parsing methods. Prior art: clang-format does not parse source code in the traditional sense; it uses the result of the lexer, i.e. tokens. The job of an alternative parser is to build aTokenPartitionTree
, thereby allowing complete re-use of all the algorithmic formatting code, expressing grouping and indentation rules.Benefits:
Costs: