Miniconsole won't autowrap for languages with no word-splitters like Chinese.

icerxkx commented 3 years ago

Brief summary of issue / Description of requested feature:

autowrap will only wrap on space. But for Chinese, there is no space between words. It should wrap whenever the line is long enough.

Steps to reproduce the issue / Reasons for adding feature:

Error output / Expected result of feature

And also the text width is not calculated correctly, I think. The message before last should wrap at 3, but now at 4.

Extra information, such as Mudlet version, operating system and ideas for how to solve / implement:

vadi2 commented 3 years ago

It's also an issue that really long English words overflow as well and aren't broken up.

SlySven commented 3 years ago

Marking this as high priority because although it is not fatal to the application it is very bad for some locales and might render it unusable in those (long lines - that cannot be scrolled {I can't recall if we ever got horizontal scrolling implemented in the TConsoles class!} will mean that parts of the text cannot be seen)...

As to solving it I imaging we will have to visit the three methods in TBuffer where a local variable (QString) lineBreaks is utilised and rewrite them to use QTextBoundaryFinder when it is set to work in QTextBoundaryFinder::Line mode and finding the longest runs in each (QStringList*) TBuffer::lineBuffer where the base graphemes individual widths (as determined by widechar_wcwidth((unsigned int) unicodeBaseCodePoint) {1 or 2, plus special handling for horizontal tabs) does not exceed the "wrapAt" setting...

SlySven commented 3 years ago

As a related issue, I am also considering revising the code in TBuffer::translateToPlainText(...) that detects and processes the End-of-Line condition. We do not actually store the character(s) that we get, that end each line and hard-wrap (so it cannot be undone) the received text. However because of cc08f815b3442d86415aa73e6724a48495a75d1d (part of #3625) and (What was originally introduced by 03d5eb9558816c6b23a0a8636f07480757c2f302 by Heiko back in 2010) we eat all 0x04 and 0xFF bytes that make it through the telnet protocol handling and treat those as End-of-line marks respectively - even though some encodings (not just CP437) need those characters to be handled normally.

FTR: Whilst 0xFF is NOT used in UTF-8 it is used in many other Extended (256 character) ASCII encodings, e.g:

ÿ = U+00FF {LATIN SMALL CASE Y WITH DIAERESIS} (used for a few French place-names see https://en.wikipedia.org/wiki/Diaeresis_(diacritic)#French) - ISO 8859-1
˙ = U+02D9 {DOT ABOVE} - ISO 8859-2, -3, -4, -9, -14, -15, -16
џ = U+04F5 {CYRILLIC SMALL LETTER DZHE} - ISO 8859-5
ĸ = U+0138 {LATIN SMALL LETTER KRA} - ISO 8859-10
’ = U+2019 {RIGHT SINGLE QUOTATION MARK} - ISO 8859-13
ˇ = U+02C7 {CARON} - MACINTOSH (MACROMAN)

As it is, I think it might be possible to avoid removing the special processing of those two characters.

(We use them internally to):

0x04 - we eat this one to hide Achaea's sending of it when MXP is enabled/disabled, according to cc08f815b3442d86415aa73e6724a48495a75d1d.
0xFF - it was not (immediately, to me) clear why this should be ignored if it is received - obviously at the time it was done Mudlet was only working on ASCII so this byte would not be expected to be encountered outside of a Telnet command - however commits touched this area of the codebase (3e127f554bb23f8e86fbe64c560bccfc7bd096d0 03d5eb9558816c6b23a0a8636f07480757c2f302 Heiko, also in 2010) uses it to flag that the line is a prompt one (does NOT end with a Line-Feed/Carriage-Return...

SlySven commented 1 year ago

Reopened because PR that was supposed to fix it had to be reverted because it introduced other issues.

ZookaOnGit commented 1 week ago

PR suggests fix still an issue? reopen if so

SlySven commented 6 days ago

Yes - and it hasn't been helped since we had to revert another PR that "improved" (re)wrapping in user windows - TBH and IMHO fixing the latter could also lead to improving things in this issue - I had recent started looking into both but had not really got going on it before I got pulled away to other things.

I'll assign this to myself to try and get back to it.

Mudlet / Mudlet