TeXworks / texworks

Main codebase for TeXworks, a simple interface for working with TeX documents
https://tug.org/texworks/
GNU General Public License v2.0
697 stars 127 forks source link

U+2028 LINE SEPARATOR in editor #1028

Open jlaurens opened 1 year ago

jlaurens commented 1 year ago

Bug description:

Here is the source

Capture d’écran 2023-10-01 à 12 01 06

Here is the output

Capture d’écran 2023-10-01 à 12 01 16

At the end of the first line, there is a U+2028 line separator. It causes TeX line numbering and TeXworks line numbering to be different.

Originally, the U+2028 line separator was entered accidentally, but it may exist in the tex source for some good reason.

Steps to reproduce the problem:

Expected behavior:

General information: TeXworks version: Version 0.6.8 ("github") [r.6b1c6ab, ] TeXworks obtained from: Operating system:

Additional information:

stloeffler commented 1 year ago

Thanks for reporting. Unfortunately, this can't be fixed (easily) on the TeXworks side as the line numbering relies on Qt's internal text layouting for identifying "text blocks" (=lines). The Qt routines treat U+000A, U+000D, and U+2029 as "block/paragraph separators", while U+2028 is not considered to start a new "block/paragraph". Apart from bypassing (and therefore rewriting) the entire text layout code of Qt, there is not much I can do. You can raise the issue with the Qt devs, though, of course

jlaurens commented 1 year ago

One solution is to change any U+2028 into U+000D. There is another similar problem with some unicode space characters that do not have the correct TeX category code. The editor displays a normal space whereas TeX sees an "other" character.

I'll see what I can do in some distant future.

stloeffler commented 8 months ago

As you wrote in your original post: U+2028 "may exist in the tex source for some good reason". So I don't like arbitrarily changing characters (in fact, Tw goes to some lengths to preserve unicode BOM while avoiding visual artifacts).

I think the "proper" way of fixing this would be to somehow "hack" into the existing text document layout - or even implement a completely custom one (which would also avoid problems such as #469), but that's probably a monumental undertaking...