ashtuchkin / iconv-lite

Convert character encodings in pure javascript.
MIT License
3.08k stars 282 forks source link

Provide performant iconv.encodeInto API (previously iconv.byteLength) #308

Open viktor-podzigun opened 1 year ago

viktor-podzigun commented 1 year ago

Currently one can use:

iconv.encode(str, encoding).length

but it's slow because it creates intermediate and immediately discarded buffer.

Would be better to expose similar but efficient api:

iconv.byteLength(str, encoding)
ashtuchkin commented 1 year ago

It would require significant effort, as all the logic of all codecs would have to be duplicated and adjusted.

What's your use case?

On Mon, Feb 27, 2023, 05:24 Viktor Podzigun @.***> wrote:

Currently one can use:

iconv.encode(str, encoding).length

but it's slow because it creates intermediate and immediately discarded buffer.

Would be better to expose similar but efficient api:

iconv.byteLength(str, encoding)

— Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/308, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHNQSZBOFQ3PRRCWIA3WZR6G3ANCNFSM6AAAAAAVJGTPUQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

viktor-podzigun commented 1 year ago

I see, would it be then at least possible for now to provide additional method, similar to TextEncoder.encodeInto:

iconv.encodeInto(str, encoding, buf)

I have a use case, where I need to get byte lengths for bunch of strings in specific encoding.

ashtuchkin commented 1 year ago

This one is easier but still a significant change. I'll look into supporting it in the next version.

To help me prioritize and design it, could you provide more data about your use case? I.e. what's the average string length, is there anything special in the distribution of their lengths, the codec you're planning to use (at least single charcter vs double character), how many conversions per second do you have per node process? Maybe also high-level overview of what you're trying to do (if possible of course).

This additional method would help in some use cases and not in others, so I just want to make sure it'll actually make a difference in your case.

viktor-podzigun commented 1 year ago

That would be great, thank you @ashtuchkin for your quick reply and support!

I would need it for the file viewer that I'm developing as part of FAR.js app. To show file content in the wrap mode I need to translate back strings that were read from file to their lengths in bytes to be able to do proper scrolling. The encoding could be any supported one. I think its pretty common use-case.

Until now I was using Buffer encodings, but plan to switch to your nice iconv-lite lib to support more encodings.

But it's not urgent, I can use iconv.encode(str, encoding).length for now. (Even though scrolling with the mouse maybe a bit slow.)

Thanks again!

ashtuchkin commented 1 year ago

That makes sense, thank you for the context!