Enter-tainer / typstyle

Beautiful and reliable typst code formatter
https://enter-tainer.github.io/typstyle
Apache License 2.0
280 stars 9 forks source link

Research on Code Formatters Across Programming Languages #103

Open Enter-tainer opened 3 months ago

Enter-tainer commented 3 months ago

Research and study code formatting tools for various mainstream programming languages, including but not limited to:

  1. Understand the working principles of each tool
  2. Compare features, pros, and cons of different formatters
  3. Learn from their designs and implementations
  4. Examine how they handle special cases and edge conditions

After completing the research, summarize findings and consider how to apply the learned knowledge to typstyle

Enter-tainer commented 3 months ago

Ruff

Ruff has a ruff_formatter crate which is forked from rome_formatter. It looks like ruff is heavily inspired by rome. It has an IR for formatting which is similar to pretty rs.

The initiail looking of ruff's design is similar to typstyle. Parse source code to ast, then transform ast into formatter IR, then print IR into strings. Notablely, it has special handling for comments, like finding the source code range of every comment element. Haven't figure out how it is used yet.

Ruff categorizes comments into three types, leading, dangling and trailing. Python doesn't have inline comments so ruff's life is easiler than typstyle.

It also has a contribution.md describing challenges in formatting comments.

It looks like for each ast node struct, ruff has a format struct for it. And ast_node.format() will transform a ast node into the corresponding format struct. All these format structs impls a format trait, which defines how to produce formatter IR for each struct.

I feels like ruff's formatter is better than pretty rs. Maybe we should investigate more and consider switch to it or rome formatter. But before we do that we should make sure it is more powerful than current one.

Enter-tainer commented 3 months ago

Prettier

I'm more familar with prettier. From what i've seen before, prettier implements Wadler's pretty printer and support more convenient extensions.

How it handles comments remains unclear. It has a massive file(~1k loc) for handling comments. https://github.com/prettier/prettier/blob/main/src/language-js/comments/handle-comments.js Looks like it manually handles all cases for each ast node type.

Interestingly it can also format markdown. I wonder if it do hard line wrapping and how it handles edge cases like https://github.com/Enter-tainer/typstyle/issues/75#issuecomment-2171275781 From code, it looks like the code is written by janpanese or chinese people. The idea is interesting: newlines and spaces are inter-changeable (conditionally). And it doesn't allow breaking for certain cases. I feels like it is possible to do hard line wrapping after reading prettier's implementation. https://github.com/prettier/prettier/blob/main/src/language-markdown/print-whitespace.js

Enter-tainer commented 2 months ago

Rustfmt

Rustfmt doesn't use a format IR. Instead, it directly visit each ast node and produce strings. This gives it more flexibility. For example, rustfmt can set differen line width limit for different construct. This is impossible in prettier or ruff.

I think we cannot learn a lot from it. Maybe it is too hard and boring to make a rustfmt-ish formatter.