astrale-sharp / typstfmt

Apache License 2.0
253 stars 25 forks source link

[bug] Line length checking is not accurate #86

Open Andrew15-5 opened 1 year ago

Andrew15-5 commented 1 year ago

This is a sub-issue from #82 (see examples there). In https://github.com/astrale-sharp/typstfmt/commit/a9e6249019111783380be88390e86af3441076ab line length can exceed the configured line limit. Specifically, if the text (content block) is used, then it can be chopped into smaller lines which won't alter the output.

EDIT by @astrale-sharp The first line of a nested block also doesn't respect max_line_length (it's unaware of it's parent)

astrale-sharp commented 1 year ago

Ex. 1

#very-long-long-long-long-long-function-name[Lorem ipsum dolor sit amet, consectetur adipiscing
  elit, sed do eiusmod tempor incididunt ut labore
  et dolore magnam aliquam quaerat voluptatem. Ut
  enim aeque doleamus animo, cum corpore dolemus,
  fieri.]

Ex. 2

/ Term : Some interesting words that are veeeeeeeeeeeeeeeeeery long
astrale-sharp commented 1 year ago

Ex 1 can be solved with your smart proposition which i'm feeling more on board with now even tho it make me feel uneasy still. It also involves a hack: the parent tries to understand the child had to broke by scanning the child length and comparing it to max length minus 7 (it's used elsewhere too, see the todos in the code)

The alternative to the hack would be to have more context, maybe context could contain a hashmap of span -> info and info could hold a needed to break flag

astrale-sharp commented 1 year ago

Example 2 happens because \ Term : is not part of the markup, so the checking of the length only concerns it's sibling, it could be solved by checking if the parent is a term item and adding the length of the left siblings the the first line of size comparison

I'm going to take care of this in a few days but in the meanwhile if someone wants to do it, say so and I'll let you, I welcome PRs!

When i start working on this I'll say so

Andrew15-5 commented 1 year ago

If ex. 2 is indeed a Typst term, then the slash must be forward, not backward.

astrale-sharp commented 1 year ago

Edited, thank you :)

astrale-sharp commented 1 year ago

One reason I'm hesitant to take the provided solution is this: f(arg)[too long][short][too long again][ long but can be broken cause spaced ] What should be done then, everything in parenthesis except the last one? I mean probably cause it's prettier that way but I feel bad modifying that much code of the users

Andrew15-5 commented 1 year ago

The question is hard, because:

But regardless, as I said, the KISS principle is good as it can improve readability. There is already a nice example https://github.com/typst/typst/issues/311#issuecomment-1666575619:

show heading: it =>  {
    it
    par()[#text(size:0.5em)[#h(0.0em)]]
}

Specifically, par()[#text(size:0.5em)[#h(0.0em)]] can be greatly simplified to par(text(size:0.5em, h(0.0em))).

I already mentioned (somewhere) that I want text to be as packed as possible, like any Markdown file should be. Another example is (AGPL) license file: https://www.gnu.org/licenses/agpl-3.0.txt.

So, f(arg)[too long][short][too long again][ long but can be broken cause spaced ] IMO should be formatted as:

f(arg)[too long][short][too long again][ long but
  can be broken cause spaced ]

(limit is 50 columns)

...is what I would say, if I didn't think for a while. To eliminate multistep formatting that in nested cases can slow down the formatting time (maybe not), A more general rule should be applied. Like a rule priority. Something like this:

  1. if there is a whitespace(s) after [ + the line exceeds the limit + the [ does not exceed the limit, then put change (all) whitespace(s) with a (single) newline after the [
  2. if in the previous rule the newline was added + there is a whitespace(s) before the ] which is a pair to the [ from the previous rule, then change (all) whitespace(s) with a (single) newline before ]

Hmm... I don't know how it could be implemented, but there is still 2 steps in my rules. Ok, I'll finish the idea. After first 2 rules we will get:

f(arg)[too long][short][too long again][
  long but can be broken cause spaced
]

Of course somewhere between the rules we have to add 1 indent, because 1st line ends with [ and 3rd line starts with the paired ].

  1. And now we only need to chop the content (2nd line) if it exceeds the limit.

But it doesn't exceed, so we finished formatting.

Andrew15-5 commented 1 year ago

Here are similar examples (limit: 50, indent: 2):

#f(arg)[too long][short][too long again][ long but can be broken cause spaced]

#f(arg)[too long][short][too long again][ long abc long abc long long long long long long long long long abc long long long long long long long long long]

// According to my rules:

#f(arg)[too long][short][too long again][
  long but can be broken cause spaced]

#f(arg)[too long][short][too long again][
  long abc long abc long long long long long long
  long long long abc long long long long long long
  long long long]

// But this kinda looks better, especially if the content is very long:

#f(arg)[too long][short][too long again][ long but
  can be broken cause spaced]

#f(arg)[too long][short][too long again][ long abc
  long abc long long long long long long long long
  long abc long long long long long long long long
  long]

So maybe change rules to this:

  1. if there is whitespace(s) after [ and before ] + the [ does not exceed the limit, then put the content which is inside the [] on a separate line.
  2. if there is whitespace(s) only after [, then break line at the most right whitespace.

I think the rules are more of a flowchart written linear (in one line), so maybe it's easier to follow a flowchart (with nested if statements and things like that).

Andrew15-5 commented 1 year ago

Here are all 4 variants:

Whitespace after [ and before ] `[ ` & ` ]`: ```typ #f(arg)[too long][short][too long again][ long but can be broken cause spaced ] #f(arg)[too long][short][too long again][ long abc long abc long long long long long long long long long abc long long long long long long long long long ] ``` ```typ #f(arg)[too long][short][too long again][ long but can be broken cause spaced ] #f(arg)[too long][short][too long again][ long abc long abc long long long long long long long long long abc long long long long long long long long long ] ```
Whitespace only after [ `[ ` & `]`: ```typ #f(arg)[too long][short][too long again][ long but can be broken cause spaced] #f(arg)[too long][short][too long again][ long abc long abc long long long long long long long long long abc long long long long long long long long long] ``` ```typ #f(arg)[too long][short][too long again][ long but can be broken cause spaced] #f(arg)[too long][short][too long again][ long abc long abc long long long long long long long long long abc long long long long long long long long long] ```
Whitespace only before ] `[` & ` ]` ```typ #f(arg)[too long][short][too long again][long but can be broken cause spaced ] #f(arg)[too long][short][too long again][long abc long abc long long long long long long long long long abc long long long long long long long long long ] ``` ```typ #f(arg)[too long][short][too long again][long but can be broken cause spaced ] #f(arg)[too long][short][too long again][long abc long abc long long long long long long long long long abc long long long long long long long long long ] ```
No whitespaces `[` & `]`: ```typ #f(arg)[too long][short][too long again][long but can be broken cause spaced] #f(arg)[too long][short][too long again][long abc long abc long long long long long long long long long abc long long long long long long long long long] ``` ```typ #f(arg)[too long][short][too long again][long but can be broken cause spaced] #f(arg)[too long][short][too long again][long abc long abc long long long long long long long long long abc long long long long long long long long long] ```
Andrew15-5 commented 1 year ago

Here are all 4 variants if 2nd to last argument exceeds the limit:

Whitespace after [ and before ] `[ ` & ` ]`: ```typ #f(arg)[too long][short][ long but can be broken cause spaced ][too long again] #f(arg)[too long][short][ long abc long abc long long long long long long long long long abc long long long long long long long long long ][too long again] ``` ```typ #f(arg)[too long][short][ long but can be broken cause spaced ][too long again] #f(arg)[too long][short][ long abc long abc long long long long long long long long long abc long long long long long long long long long ][too long again] ```
Whitespace only after [ `[ ` & `]`: ```typ #f(arg)[too long][short][ long but can be broken cause spaced][too long again] #f(arg)[too long][short][ long abc long abc long long long long long long long long long abc long long long long long long long long long][too long again] ``` ```typ #f(arg)[too long][short][ long but can be broken cause spaced][too long again] #f(arg)[too long][short][ long abc long abc long long long long long long long long long abc long long long long long long long long long][too long again] ```
Whitespace only before ] `[` & ` ]` ```typ #f(arg)[too long][short][long but can be broken cause spaced ][too long again] #f(arg)[too long][short][long abc long abc long long long long long long long long long abc long long long long long long long long long ][too long again] ``` ```typ #f(arg)[too long][short][long but can be broken cause spaced ][too long again] #f(arg)[too long][short][long abc long abc long long long long long long long long long abc long long long long long long long long long ][too long again] ```
No whitespaces `[` & `]`: ```typ #f(arg)[too long][short][long but can be broken cause spaced][too long again] #f(arg)[too long][short][long abc long abc long long long long long long long long long abc long long long long long long long long long][too long again] ``` ```typ #f(arg)[too long][short][long but can be broken cause spaced][too long again] #f(arg)[too long][short][long abc long abc long long long long long long long long long abc long long long long long long long long long][too long again] ```

One thing that could be a problem is something like this:

#f(arg)[too long][short][long abc long abc long
  long long long long long long long long abc long
  long long long long long long long long][too
  long again]

Maybe an easier approach is to only format things until the ]:

#f(arg)[too long][short][long abc long abc long
  long long long long long long long long abc long
  long long long long long long long long][too long again]

This way we can reapply rules to the last still exceeding the limit line. I think all the rules will work very nicely the second time too.

But this again makes formatting in multiple steps. But since Rust's binaries are superfast, it may not be a problem. What do you think, @astrale-sharp?

astrale-sharp commented 1 year ago

So, I don't think perf would be an issue, what I'm unsure about is that we would then be reimplementing something very similar to how content block format themselves, but the parent would take care of it. It's an exception and isn't pretty but I guess we don't have a choice here

astrale-sharp commented 1 year ago

in lorem ipsum #loooooooooooooooong_var, long_var might go above the length and yet the line will be broken after it, not before