JS implementation for String.copyTo doesn't copy some esoteric Characters correctly

ASzc commented 8 years ago

I have been working on ceylon/ceylon-sdk#449, and found that this string: 𐀀􏿽􏿽 encodes fine on the JVM but is partially incorrect on JS (too short and distorted near the end). Single characters and longer strings of more common characters work fine.

I've traced this to CharacterBuffer.array, which is an Array<Character> populated from a String by a call to String.copyTo

Printing the String literal and printing CharacterBuffer.array after copyTo:

JVM:

𐀀􏿽􏿽
{ 𐀀, 􏿽, 􏿽 }

JS:

𐀀􏿽􏿽
{ 𐀀, 􏿽, � }

If I swap out string.copyTo(array); for:

for (i->c in string.indexed) {
    array.set(i, c);
}

Then JS returns (the same as JVM):

𐀀􏿽􏿽
{ 𐀀, 􏿽, 􏿽 }

The native JS implementation looks equivalent to the above Ceylon, but perhaps there's some subtle error?

ASzc commented 8 years ago

@chochos I see you wrote the native JS for this, any thoughts?

chochos commented 8 years ago

Probably something to do with codepoints

ceylon / ceylon.language

JS implementation for String.copyTo doesn't copy some esoteric Characters correctly #794