Closed benjie closed 6 years ago
Awesome thank you! I've dealt with this gotcha in the past and I think there's a solution that shouldn't require adding Unicode.js - I'm just on my phone right now so I can't test to confirm but using Array.from(string).length ? (Documented at https://mathiasbynens.be/notes/javascript-unicode)
I considered that from this post but it's an ES6 feature so I figured we'd have to add a substitute implementation anyway and I feared there may be unicode strings that differ in length in Array.from
vs strlen
. Ultimately I figured it'd be safer to use DraftJS's internal implementation. (There must be a reason they don't use Array.from
? Maybe it's just that it requires polyfilling?)
I just tried a couple emoji that return length 2 from Array.from, and they also have length 2 with strlen:
console.log(Array.from('❤️').length); // 2
console.log(strlen('❤️')); // 2
console.log(Array.from('👍🏻').length); // 2
console.log(strlen('👍🏻')); // 2
This isn't an exhaustive test though, and I'm not sufficiently knowledgeable about Unicode to make this determination. Happy to make the change if you're confident though!
The example that post gives with length 8 / 7 also works fine with strlen / Array.from:
const emoji = '\ud83d\udc69\u200d\u2764\ufe0f\u200d\ud83d\udc8b\u200d\ud83d\udc69';
console.log(emoji); // 👩❤️💋👩
console.log(Array.from(emoji).length); // 8
console.log(strlen(emoji)); // 8
Sorry @benjie I don't mean to leave you hanging, just have limited free time these days 🙃 This is my first time opening my laptop in weeks! And it will only last about 5 minutes...
but it's an ES6 feature so I figured we'd have to add a substitute implementation anyway
I think we're fine without that; the library already uses a few ES6 features and has a disclaimer in the readme about the need for a polyfill!
So yeah I'm happy to just use Array.from! Keeps things smaller and simpler and I'm pretty sure it should work in all cases 🤞 (I am also not a unicode expert but that's just based on my limited experience on other projects where I saw this come up)
Cool; I’ll get it done over the next week hopefully! 👍 No worries about delay; I saw your message in the readme 😄
Closing this so we can easily restore fbjs
implementation if it turns out we need it. Opening a new PR with the simplified code: #48
Hey, great project!
I discovered an issue where when you perform
markdownToDraftJS
on markdown that contains emoji, all the DraftJS inline styles become offset, e.g.:Testing 👍 _italic_ words words words **bold** words words words
(
Testing \ud83d\udc4d _italic_ words words words **bold** words words words
)becomes:
I've submitted this PR to fix this issue, when reviewing I recommend reviewing the commits one at a time - note that the second commit is just a direct copy from
fbjs
'sUnicodeUtils.js
which I then tweak in the next commit to make any changes explicit.I include below a longer explanation of the issue, mostly so I remember exactly how I figured all this out.
The reason for this issue is that DraftJS treats length of the
👍
as1
(it is 1 character) whereas JavaScript strings treat it has having length2
due to the surrogate pair ("\ud83d\udc4d".length === 2
/"👍".length === 2
).To see this in their tests, check out the test for decodeInlineStyleRanges:
and the resulting snapshot:
Here you can see that because
BOLD
andITALIC
both overlap the emoji, even though their lengths are 4 and 8 respectively the each affect 5 and 9 elements in the resulting array respectively.This is because in
decodeInlineStyleRanges
, they usesubstr
fromUnicodeUtils
to parse the ranges.To solve this, we need to use
strlen(content)
fromUnicodeUtils
to calculate the string lengths rather than usingcontent.length
. FrustratinglyUnicodeUtils
is in thefbjs
package which explicitly says to not depend on it because it may break. For this reason (and since it's MIT licensed so shouldn't be an issue) I've copied over the one fileUnicodeUtils.js
into this project and extremely lightly edited it so it doesn't add any dependencies. I've also added a failing test to reproduce the issue which I've then made pass usingUnicodeUtils.strlen
.