Closed yosion-p closed 3 years ago
I think I got it,
the merged output of iconv-lite encoding must be equal to Node's encoding of the whole string (
Buffer.from(S, "utf8")
).
sorry,I thought incomplete surrogate pairs were invalid and could be filtered out. I will correct all outputs.
But about that:
a loop over the whole string is pretty inefficient. Moreover, you don't need to touch internal surrogates - Node processes them correctly already. We only need to adjust boundaries, as these are what Node can't adjust as it doesn't know about them.
I don't really understand, Because if I don't loop through the string, I won't know which character in the string needs to be processed.
I was wondering if there was something wrong with the test method,not Encoder.
Just like I thought before,
If UT into multiple strings in the input,IconvLiteEncoderStream.prototype._transform
will be invoked many times, back to characters be cut by Encoder rather than a complete surrogates pair.
If we modify InternalEncoder, When the surrogates pair was cut, The first call doesn't return anything, only the second one returns a full surrogates pair,
That's why I need to go through a loop and check for characters one by one to see if it is a surrogate.
I think I can solve this problem in InternalEncoderUtf8.prototype.end()
,
I will try.
Hi bro,It's me, again.🤣
I did it by modifying the code of InternalEncoderUtf8.prototype.write
.
In a word, less is more.
have a look😃
Aha i tried to make it short, but didn't work it out.
nit: typo in work "surrogates"
Almost there, thank you for persistence!
Victory lies ahead!
Hi Buddy,
I see all the tests have the same structure ....
Now i realize actually you made it very clear, it's me misunderstand it, I'm terribly sorry…
make sure the function does not throw if str is empty.
About what you said,
I have my own interpretation,
I added one condition judgment was also in the first line of InternalEncoderUtf8.prototype.write
, then I removed.
I think we still need to consider the incoming string is empty, as internal. js line: 81 InternalEncoder.prototype.write
,
otherwise UT will report an error, like I just submitted for the first time.
Here's what I thought I'd do:
if (str) {
if (this.lowSurrogate) {
str = this.lowSurrogate + str;
this.lowSurrogate = '';
}
var charCode = str.charCodeAt(str.length - 1);
if (55296 < charCode && charCode <= 56319) {
this.lowSurrogate = str[str.length - 1];
str = str.slice(0, str.length - 1);
}
}
return Buffer.from(str, this.enc);
But it has nested conditional judgments, so, I did something like this:
if (!str) return Buffer.from('', this.enc);;
if (this.lowSurrogate) {
str = this.lowSurrogate + str;
this.lowSurrogate = '';
}
var charCode = str.charCodeAt(str.length - 1);
if (55296 < charCode && charCode <= 56319) {
this.lowSurrogate = str[str.length - 1];
str = str.slice(0, str.length - 1);
}
return Buffer.from(str, this.enc);
Although there are two Buffer.from
in it.
I agree, we need to return an empty buffer, not undefined, when str is empty.
I've made some final minor adjustments (e.g. it's actually a high surrogate, not low :) and will merge. Thank you for your contribution!
Thank you buddy! It's a milestone for me the first time my code got merged in an open source project.
Even though my pointcut and idea are right since the beginning, but the process was kind of tortuous.
Maybe I need to pay more attention to the format and performance of the code,
and try to get your full point.lol
In short, thanks a lot for your patience and encouragement, respect!
Best Regards, Yosion.
At 2021-09-19 01:55:57, "Alexander Shtuchkin" @.***> wrote:
I agree, we need to return an empty buffer, not undefined, when str is empty.
I've made some final minor adjustments (e.g. it's actually a high surrogate, not low :) and will merge. Thank you for your contribution!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
You did great, keep it up! It all comes with practice, don't worry too much.
Hi bro,It's me, again.🤣 I have completed the
InternalEncoder
for utf8 with surrogates after I read what you meant(#275 ).275 looked a bit messy, so I submitted a brand new one,
I didn't know much before, but now I know more. I think surrogates should be in pairs always. Thank you for your patience.