Adapt line folding to cover surrogate pairs

HeikoTheissen commented 3 years ago

https://github.com/sedenardi/node-caldav-adapter/blob/62a34c2c9204db37f789d08c378318127cb4090d/src/common/eventBuild.ts#L84

Consider adapting this line folding method as in sebbo2002/ical-generator#223.

sedenardi commented 3 years ago

Very interesting. Thanks for linking that PR so I could learn from your blog post (along with Wikipedia to learn why there are only 17 planes, rather than 256 like 2 hex digits would suggest).

I'm more than happy to use this logic in this library, given that it follows the RFC and is used in ical-generator, which this library relies on to generate iCal objects. Before I copy/paste your implementation in the linked PR, I've got a couple questions:

What are the unicode boundaries you're checking in if (ch >= '\ud800' && ch <= '\udbff') (I didn't see anything in your post or wiki that would tip me off)
Why are you checking unicode value range rather than just checking length of the character like you do in your post ('𝌆'.length == 2)
Would you be open to making your function a standalone library? I feel like many CalDAV/iCal libraries would benefit from using the same octet checking, and your logic wouldn't just exist in two random libraries.

Thanks for your help.

HeikoTheissen commented 3 years ago

Very interesting. Thanks for linking that PR so I could learn from your blog post

I am not the author of this blog post, I only learned form that.

What are the unicode boundaries you're checking in if (ch >= '\ud800' && ch <= '\udbff') (I didn't see anything in your post or wiki that would tip me off)

The blog says "The first code unit of a surrogate pair is always in the range from 0xD800 to 0xDBFF".

Why are you checking unicode value range rather than just checking length of the character like you do in your post ('𝌆'.length == 2)

'𝌆' is Unicode character. But in order to get a single Unicode character from a string, I cannot use string[offset] or string.charAt(offset), because that gives me a Javascript character, which might just be the first of a surrogate pair. And to check whether this is the case, I check the Unicode value range. Makes sense?

Would you be open to making your function a standalone library? I feel like many CalDAV/iCal libraries would benefit from using the same octet checking, and your logic wouldn't just exist in two random libraries.

Good idea, but it is probably more efficient if you or @sebbo2002 publish this library alongside your existing ones. I do not claim any copyright on these few lines of code.

sedenardi commented 3 years ago

Apologies, I should've checked the author before attributing it to you! Regardless, thank you for surfacing it to me.

The blog says "The first code unit of a surrogate pair is always in the range from 0xD800 to 0xDBFF".

Doh, that's why you shouldn't read HEX char codes late at night.

'𝌆' is Unicode character. But in order to get a single Unicode character from a string, I cannot use string[offset] or string.charAt(offset), because that gives me a Javascript character, which might just be the first of a surrogate pair. And to check whether this is the case, I check the Unicode value range. Makes sense?

Yes. If I have a string const str = abc𝌆ef, str[3].length will equal 1 because it's the Javascript character.

Good idea, but it is probably more efficient if you or @sebbo2002 publish this library alongside your existing ones. I do not claim any copyright on these few lines of code.

Sounds good. I'll include the code in this library for now, and keep an eye on whether @sebbo2002 himself decides to publish it separately or not. It may be too niche to really need to be an entirely separate package anyway.

sedenardi commented 3 years ago

@HeikoTheissen Published to npm as version 4.1.0. Thanks again for your contribution.

forwardemail / caldav-adapter

Adapt line folding to cover surrogate pairs #10