Open icerxkx opened 3 years ago
It's also an issue that really long English words overflow as well and aren't broken up.
Marking this as high priority because although it is not fatal to the application it is very bad for some locales and might render it unusable in those (long lines - that cannot be scrolled {I can't recall if we ever got horizontal scrolling implemented in the TConsole
s class!} will mean that parts of the text cannot be seen)...
As to solving it I imaging we will have to visit the three methods in TBuffer
where a local variable (QString) lineBreaks
is utilised and rewrite them to use QTextBoundaryFinder
when it is set to work in QTextBoundaryFinder::Line
mode and finding the longest runs in each (QStringList*) TBuffer::lineBuffer
where the base graphemes individual widths (as determined by widechar_wcwidth((unsigned int) unicodeBaseCodePoint)
{1
or 2
, plus special handling for horizontal tabs) does not exceed the "wrapAt" setting...
As a related issue, I am also considering revising the code in TBuffer::translateToPlainText(...)
that detects and processes the End-of-Line condition. We do not actually store the character(s) that we get, that end each line and hard-wrap (so it cannot be undone) the received text. However because of cc08f815b3442d86415aa73e6724a48495a75d1d (part of #3625) and (What was originally introduced by 03d5eb9558816c6b23a0a8636f07480757c2f302 by Heiko back in 2010) we eat all 0x04 and 0xFF bytes that make it through the telnet protocol handling and treat those as End-of-line marks respectively - even though some encodings (not just CP437) need those characters to be handled normally.
FTR: Whilst 0xFF is NOT used in UTF-8 it is used in many other Extended (256 character) ASCII encodings, e.g:
ÿ
= U+00FF {LATIN SMALL CASE Y WITH DIAERESIS} (used for a few French place-names see https://en.wikipedia.org/wiki/Diaeresis_(diacritic)#French) - ISO 8859-1˙
= U+02D9 {DOT ABOVE} - ISO 8859-2, -3, -4, -9, -14, -15, -16џ
= U+04F5 {CYRILLIC SMALL LETTER DZHE} - ISO 8859-5ĸ
= U+0138 {LATIN SMALL LETTER KRA} - ISO 8859-10’
= U+2019 {RIGHT SINGLE QUOTATION MARK} - ISO 8859-13ˇ
= U+02C7 {CARON} - MACINTOSH (MACROMAN)As it is, I think it might be possible to avoid removing the special processing of those two characters.
(We use them internally to):
Reopened because PR that was supposed to fix it had to be reverted because it introduced other issues.
PR suggests fix still an issue? reopen if so
Yes - and it hasn't been helped since we had to revert another PR that "improved" (re)wrapping in user windows - TBH and IMHO fixing the latter could also lead to improving things in this issue - I had recent started looking into both but had not really got going on it before I got pulled away to other things.
I'll assign this to myself to try and get back to it.
Brief summary of issue / Description of requested feature:
autowrap will only wrap on space. But for Chinese, there is no space between words. It should wrap whenever the line is long enough.
Steps to reproduce the issue / Reasons for adding feature:
Error output / Expected result of feature
And also the text width is not calculated correctly, I think. The message before last should wrap at 3, but now at 4.
Extra information, such as Mudlet version, operating system and ideas for how to solve / implement: