c4rlo / vimhelp

Google App Engine based project which regularly generates HTML versions of the Vim help files
https://vimhelp.org
MIT License
83 stars 9 forks source link

Fix concealed text tab alignment bug #23

Closed ychin closed 1 year ago

ychin commented 1 year ago

Vim docs use concealed texts for a few types of texts: tag links (e.g. |:edit|), code blocks (e.g. `some code`), and tags (e.g. *some_tag*). The begin/end characters are hidden from view for convenience, but when Vim is calculating tab-width calculation, the concealed characters do count towards the tab.

The way vimhelp worked is that it first called line.expandtabs() to convert the hard tabs to spaces, before removing the concealed characters. Because of that, the alignment is no longer correct unless you add back the characters in as whitespace. This is somewhat difficult to do because concealed characters followed by a normal space do not need to be filled back in. You only want to do that when it's followed by a tab.

To fix this, just do not called expandtabs() and instead use hard tabs. This is fine because HTML/CSS uses 8 chars for tabs by default. We detect the case where a concealed tag is followed by a hard tab ("\t") and then manually inject two empty space after it. This fixes all the alignment issues.

This is similar to the fix at Neovim's doc generation tool which had a similar bug: https://github.com/neovim/neovim/pull/20690

Also, as a drive-by, add README documentation for how to generate a static site. Feel free to omit that if you would like to author it differently.

ychin commented 1 year ago

Here are some before/after screenshots:

index.txt:

image image

vi_diff.txt:

image image

terminal.txt:

image image

quickref.txt:

image image
ychin commented 1 year ago

Ok I found some edge cases in my original solution. If you have a tag (with concealed chars) followed by some regular text, before it hit a tab, my original code would not work because it required the tab immediately follows the tag.

Fixed the code to be more general now. It just counts the number of concealed characters we have seen on the line so far, and when we encounter a tab we just add that many spaces before the tab. I confirmed the before/after in the above comment still holds, but now some other edge cases are fixed as well (you can see that in each case the concealed characters is followed by some regular text before encountering tab):

index.txt:

image image

eval.txt:

image image
c4rlo commented 1 year ago

Thank you for your contribution!

I have refactored the code a little bit and made a few tweaks, including to the regexes to ensure they don't rely on a prior tabs-to-spaces expansion that is no longer happening.

c4rlo commented 1 year ago

This is now live.