geany / geany

A fast and lightweight IDE
https://www.geany.org
GNU General Public License v2.0
3.06k stars 593 forks source link

Smaller horizontal auto-scroll jump when typing #1952

Open ghost opened 5 years ago

ghost commented 5 years ago

Currently, with line-wrap disabled, when the typed characters reach the right-hand-side margin of the window, the window view jumps / scrolls by a distance of about half the width of the window. That is too much of a jump; would it be possible to make it smaller, ideally the size of the set indentation, or just the (average) width of a character? Maybe as an option for the user to set (though I don't see how it would harm others).

Notepad++, for example, sets the auto-scroll to 4 char-widths (at least when using monospaced); which is better, although no option to change that. Textadept sets it to 2 char-widths (incidentally, the default indent size is also 2 there). MS's Notepad (for what it's worth) sets it to 1 char-width. I did not check others.

Looking at https://www.scintilla.org/ScintillaDoc.html#ScrollingAndAutomaticScrolling , it seems it would be possible?...


I will illustrate this; bellow | represents window margins Say you type one line, press "enter", then type a second after an indentation:

|Aaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaa |
|        Caaaaaaaaaaaaadaaaaaaaaaaaaaaaaaaaaa|

Now when I type one more character... whoa Nelly, too big jump!! and you see this (too bad I can't write here in monospaced font, at least in code blocks)

|baaaaaaaaaaaaaaaaaaaa                        |
|daaaaaaaaaaaaaaaaaaaaa                       |

If it were to jump by at most an indentation size, you would see this:

|aaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaa      |
|Caaaaaaaaaaaaadaaaaaaaaaaaaaaaaaaaaa         |

which is better, as you still see the whole line you are editing.


EDIT: This is useful if you sometimes work under the following group of constraints:

elextr commented 5 years ago

Auto scrolling is done by the Scintilla editing component Geany uses. It has a number of policies available and currently Geany sets CARET_JUMPS and CARET_EVEN for x policy.

It would probably be reasonable for someone to provide a pull request that makes some/all settings available via a user setting, probably in Preferences->Editor->Display, next to Lines visible around the cursor.

ghost commented 5 years ago

...or in filetypes.common, would be faster to implement I guess

elextr commented 5 years ago

No, filetypes.common is not a general catchall for settings, its not even visible in the user interface, and there is also a push back against dumping things in Various too (just to pre-warn you :).

ghost commented 5 years ago

I read https://www.scintilla.org/ScintillaDoc.html#ScrollingAndAutomaticScrolling and I understood that, for this particular Issue, it would suffice to expose the CARET_JUMPS variable. If CARET_EVEN if left as is (value 1), then when typed character goes out of visibility / reaches limit, display will :

For value 0, he size of "one position" is probably determined by https://www.scintilla.org/ScintillaDoc.html#SCI_LINESCROLL , in particular the columns field for this Issue.

elextr commented 5 years ago

Ok, so only a checkbox needed probably.

SCI_LINESCROLL() I think is an instruction to scroll the screen, not part of autoscroll.

ghost commented 5 years ago

I thought autoscroll would use SCI_LINESCROLL() as a way to decide how much to scroll when CARET_JUMPS=0. I.e, the size of "one position" my comment above. Or maybe Scintilla dev.-s, by "one position" meant a character width, in https://www.scintilla.org/ScintillaDoc.html#SCI_SETXCARETPOLICY , see 2nd line, last column, in the table with: slop | strict | jumps | even ?

elextr commented 5 years ago

You need to understand the Scintilla terminology.

In this document, 'character' normally refers to a byte even when multi-byte characters are used.

Positions within the Scintilla document refer to a character or the gap before that character.

There are places where the caret can not go where two character bytes make up one character.

All of which means "move the caret one position" is to the next legal byte location the caret can occupy. Approximately one display character.

elextr commented 5 years ago

PS and nothing to do with the command SCI_LINESCROLL() which is for the application to use to scroll the display manually.

ghost commented 5 years ago

Thank you for the explanations. That's sort of what I guessed in my question with "Or maybe..." in my reply above.

'character' normally refers to a byte even when multi-byte characters are used

That's definitely confusing, even logically self-contradicting. They should have said "byte" if they refer to byte. So positions are between bytes, and that will be between visible characters, unless have multi-byte characters.

PS and nothing to do with the command SCI_LINESCROLL() which is for the application to use to scroll the display manually.

Oh, I see; that is for when clicking on the ends of right-side or bottom ribbons containing the scrollbars, to make the view scroll.

I had gotten to suspect SCI_LINESCROLL() to be used by autoscroll, because I was wondering how Notepad++ or Textadept (both based on Scintilla) made their autoscroll scroll by 4 and 2 characters, respectively. But never mind anymore, knowing now that "by one position" means by 1 " character" , which would be good for me.

Lot's to learn..

elextr commented 5 years ago

I suspect that the terminology is taken from the original Windows edit control Scintilla originally emulated.

And that was probably designed well before multi-byte characters, hence the code page crap that is still in Scintilla.

It may seem unusual in this age of Unicode, but the world didn't suddenly wake up Unicode, it evolved to it, with many false steps along the way, each leaving its legacy scars on applications like Scintilla.

And even Unicode isn't perfect, even if you stored each code point in the same number of bytes (ie no UTF-8 or UTF-16 encoding) there are still combinations of two code points that map to only one glyph (eg c̦ which is two code points and if you copy it to Geany you can delete forward and it will remove the c, but not the cedilla, and vice versa if you backspace, and it takes two forward cursor movements to forward over it).

Geany always uses UTF-8 encoding in the buffer, so it only meets the weird world of other encodings at load or save time. But that does mean variable length code points and issues like the above that make screen positions and positions in the buffer hard to relate.

ghost commented 5 years ago

there are still combinations of two code points that map to only one glyph (eg c̦

?!? what the .... Wikipedia (emphases mine):

Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. In text processing, Unicode takes the role of providing a unique code point—a number, not a glyph—for each character

Whether or not it is trully consistent depends on their interpretation of "character", because this section https://en.wikipedia.org/wiki/Unicode#Ready-made_versus_composite_characters talks about "main characters" and "diacritical marks" combining to make what they call "abstract character".

I personally believe it's a bad approach: not only because it makes it harder for computing industry, but it is in principle inconsistent with treatment of most, if not all characters. (ex: A made of 3 bars, B of 1 bar and 2 semi-circles or partial circles...), thus (almost) any visible character can be regarded as a combination of some small, primitive "marks" (and historically probably evolved that way).

Not a perfect standard at all.


But my practical take-away is , still, that a character (and I mean a visible character, including example of c̦ ) on the screen is represented by 1 or, for "complex" characters, several bytes. And the caret moves between "legal positions"/display characters, which is between tuples of 1 or more bytes. The computation of those "legal positions" must take into account all that mess of standards and encodings...

elextr commented 5 years ago

"diacritical marks" combining to make what they call "abstract character".

https://www.unicode.org/charts/PDF/U0300.pdf and https://www.unicode.org/charts/PDF/U1AB0.pdf and https://www.unicode.org/charts/PDF/U1DC0.pdf and https://www.unicode.org/charts/PDF/U20D0.pdf and https://www.unicode.org/charts/PDF/UFE20.pdf

Note that what they may be combined with isn't defined, thats language dependent. Some of the commonest combinations are also single code points in the standard, indeed this ç has the single code point u+00e7 and is treated as a single thing by Geany/Scintilla. But I don't think all legal combinations are single code pointed, and then there are the symbolic ones.

The caret will only move between code points, so its consistent that it takes two steps through the two code point version of c̦. Left to itself Scintilla will not put the caret within multiple bytes defining a code point, but the user program can still address those positions.

The computation of those "legal positions" must take into account all that mess of standards and encodings...

The Unicode CLDR contains all the data about code points and the semantics, combining, bi-di, zero wide, narrow, and dual wide. But one thing it does not define is visual glyphs.

To be fair to Unicode, its messy because human languages are messy, damn those humans, why can't they all just speak numbers like us bots. :grin:

Anyhow this has gotten slightly away from your original issue, which as I said just needs the Scintilla setting to be supported by Geany. All you need is Glade 3.8 to modify the UI but still support GTK2, but most distros don't provide it, so you need to compile your own 3.8.5 from https://ftp.gnome.org/pub/GNOME/sources/glade3/3.8/.

ghost commented 5 years ago

I suspect most implementations prefer to work with the 1-code point versions of those combined characters; so willy-nilly those standard commitees will be pushed in the right direction.

The caret will only move between code points...

Nice, some relief here; now those tables matching unicode code points to corresponding code units in say UTF8 encoding should make it simpler.

... why can't they all just speak numbers like us bots. :grin:

:laughing: