Adjust string length for fullwidth characters when `end` is undefined

chalk / slice-ansi

Slice a string with ANSI escape codes

MIT License

47 stars 21 forks source link

Adjust string length for fullwidth characters when `end` is undefined #27

Closed kevva closed 4 years ago

kevva commented 4 years ago

Fixes #26. Closes #29.

IssueHunt Summary

### Referenced issues This pull request has been submitted to: - [#26: Lose characters when fullwidth characters exist and end is not specified](https://issuehunt.io/repos/43191689/issues/26) --- IssueHunt has been backed by the following sponsors. [Become a sponsor](https://issuehunt.io/membership/members)

sindresorhus commented 4 years ago

@TiagoDanin Can you help review?

Qix- commented 4 years ago

Can we add a few cases where string.normalize() would change the input string, too? I want to make sure that won't break anything.

kevva commented 4 years ago

Good catch, that actually breaks stuff. I wrote the code that included the normalization, but now that I think of it, I don't see why we need it. It'll only modify the string to use a different character than the one inputted. Also, all the tests passes without normalization as well.

'\u006E\u0303test' === '\u006E\u0303test'.normalize();
//=> false

It doesn't break anything, except that if you're going to compare the output with your input some characters might differ.

Qix- commented 4 years ago

I agree, we shouldn't care about it. It's another linear sweep of the string, anyway, so it's a win-win. Let's remove it ^^

kevva commented 4 years ago

I'll do it in a separate PR though since it's unrelated to this.

TiagoDanin commented 4 years ago

I found a big problem. The support fullwidth characters ( #4, #9) does not work with ANSI combination.

DeepinScreenshot_select-area_20190920220205

\u001B[31m古古test\u001B[39m does not return the \u001B[31m

UPDATE: Sugestion

if (!astralRegex({exact: true}).test(character) && isFullwidthCodePoint(character.codePointAt())) {
            ++visible;
            ++begin;
            ++stringEnd

            if (typeof end !== 'number') {
                ++stringEnd;
            }
        }

This should fix.

Bus this break the test 'supports fullwidth characters', I believe that here "안녕하세" (https://github.com/chalk/slice-ansi/blob/9df7d275bbb72b198fbb86818ae277949b16a244/test.js#L42) has only 4 (안(c548) 녕(b155) 하(d558) 세(c138)) characters and not 8

kevva commented 4 years ago

@TiagoDanin, could you provide a test example?

TiagoDanin commented 4 years ago

test.failing('fullwidth characters + ANSI codes', t => {
    t.is(sliceAnsi('\u001B[31m古古test\u001B[39m', 3), '\u001B[31m古test\u001B[39m');
    t.is(sliceAnsi('\u001B[31m古古test\u001B[39m', 3, 4), '\u001B[31m古\u001B[39m');
});

kevva commented 4 years ago

@TiagoDanin, that's no bug, you're starting to slice in the middle of a full width character. This works:

t.is(sliceAnsi('\u001B[31m古古test\u001B[39m', 2), '\u001B[31m古test\u001B[39m');
t.is(sliceAnsi('\u001B[31m古古test\u001B[39m', 2, 4), '\u001B[31m古\u001B[39m');

This is also how I'd expect this library to work. If we were to treat full width characters as "single" characters, I'd add an option for it but it's out of scope of this issue.