Mudlet / Mudlet

⚔️ A cross-platform, open source, and super fast MUD client with scripting in Lua
https://mudlet.org
GNU General Public License v2.0
735 stars 268 forks source link

Miniconsole won't autowrap for languages with no word-splitters like Chinese. #5564

Open icerxkx opened 3 years ago

icerxkx commented 3 years ago

Brief summary of issue / Description of requested feature:

autowrap will only wrap on space. But for Chinese, there is no space between words. It should wrap whenever the line is long enough.

Steps to reproduce the issue / Reasons for adding feature:

Error output / Expected result of feature

1

And also the text width is not calculated correctly, I think. The message before last should wrap at 3, but now at 4.

Extra information, such as Mudlet version, operating system and ideas for how to solve / implement:

vadi2 commented 3 years ago

It's also an issue that really long English words overflow as well and aren't broken up.

SlySven commented 3 years ago

Marking this as high priority because although it is not fatal to the application it is very bad for some locales and might render it unusable in those (long lines - that cannot be scrolled {I can't recall if we ever got horizontal scrolling implemented in the TConsoles class!} will mean that parts of the text cannot be seen)...

As to solving it I imaging we will have to visit the three methods in TBuffer where a local variable (QString) lineBreaks is utilised and rewrite them to use QTextBoundaryFinder when it is set to work in QTextBoundaryFinder::Line mode and finding the longest runs in each (QStringList*) TBuffer::lineBuffer where the base graphemes individual widths (as determined by widechar_wcwidth((unsigned int) unicodeBaseCodePoint) {1 or 2, plus special handling for horizontal tabs) does not exceed the "wrapAt" setting...

SlySven commented 3 years ago

As a related issue, I am also considering revising the code in TBuffer::translateToPlainText(...) that detects and processes the End-of-Line condition. We do not actually store the character(s) that we get, that end each line and hard-wrap (so it cannot be undone) the received text. However because of cc08f815b3442d86415aa73e6724a48495a75d1d (part of #3625) and (What was originally introduced by 03d5eb9558816c6b23a0a8636f07480757c2f302 by Heiko back in 2010) we eat all 0x04 and 0xFF bytes that make it through the telnet protocol handling and treat those as End-of-line marks respectively - even though some encodings (not just CP437) need those characters to be handled normally.

FTR: Whilst 0xFF is NOT used in UTF-8 it is used in many other Extended (256 character) ASCII encodings, e.g:

As it is, I think it might be possible to avoid removing the special processing of those two characters.

(We use them internally to):

SlySven commented 1 year ago

Reopened because PR that was supposed to fix it had to be reverted because it introduced other issues.

ZookaOnGit commented 1 week ago

PR suggests fix still an issue? reopen if so

SlySven commented 6 days ago

Yes - and it hasn't been helped since we had to revert another PR that "improved" (re)wrapping in user windows - TBH and IMHO fixing the latter could also lead to improving things in this issue - I had recent started looking into both but had not really got going on it before I got pulled away to other things.

I'll assign this to myself to try and get back to it.