ashtuchkin / iconv-lite

Convert character encodings in pure javascript.
MIT License
3.04k stars 282 forks source link

feat: Add user callback for handling invalid characters. #283

Open yosion-p opened 2 years ago

yosion-p commented 2 years ago

hi buddy,this is work-in progress. It might not be easy to get the user to write a callback function, but I think we can at least do it first.

I have noticed that there are many issues about callback(#53) at present, It may be difficult to solve all the problems, but we can take them apart step by step, So now I have taken the first step (at least we can solve #210 for now), and the first step now seems easy(or too easy🤣).

First, we declare and tell the user what custom rules can be processed:

iconv.invalidChar;
iconv.ignore = '';
// ...other rules
// Sorry, I don't know how to describe it here, maybe TS would be better

Then, the options argument can be passed in when the user calls public API, like:

iconv.decode(Buffer, encoding,{
  invalidChar: function(){/**/},
  ignore: someRules,
  // ...other rules
})

Finally, we pick it up at decode:

iconv.decode = function decode(buf, encoding, options) {
// ...
if(options && options.invalidChar)  // ...
// ...
}

and the encoding is written similarly.

In short, we accept some special rules that user-written callback functions.

yosion-p commented 2 years ago

HI,bro @ashtuchkin Do you mind if I try to do that?

ashtuchkin commented 2 years ago

Hey, I think it's a good intention, but it's also a pretty complex topic (that's why I didn't implement it back in the days) and I don't have enough time these days to actively review or discuss design. Maybe in a couple months when the load on my day job decreases..

I don't think your current approach would work because '�' character can be in the input and we don't want to call the callback in that case.

yosion-p commented 2 years ago

Hi bro, sincerely speaking, since Github is an open source community, i'd love to contribute, no matter it's easy or hard.

I'm just back from the holiday and got the email. But I'm a little confused.

If we're just trying to solve #210 , this would work, because the user wants to return as custom expectations.

Now that I'm back, I'm thinking about this PR(or a related set of questions).

because '�' character can be in the input and we don't want to call the callback in that case.

May I just throw an error instead of calling the callback when '�' character be inputed?

ashtuchkin commented 2 years ago

i'd love to contribute, no matter it's easy or hard.

I understand that, and I think it's a great intention and a great way to learn. I appreciate that you're doing this. The problem is that it would require time commitment from my side to take this PR to a level at which I would feel comfortable merging it. This issue is more complex than your previous pull request. I don't have a good design in my head that would resolve it in a good way, so to even start code reviewing I would need to spend time thinking about it / designing it. I don't think an easy solution exists here that would not harm baseline performance and would avoid adding unexpected behavior.

May I just throw an error instead of calling the callback when '�' character be inputed?

No, that would be unexpected behavior for the library users and would potentially break existing use cases.

yosion-p commented 2 years ago

More than a month ago, I analyzed almost 40 issues and sorted them into an excel. I think we can mainly focus on solving these issues:

  1. Callback related: #210, 127, 83, 81, 73, 55, 53
  2. Coding related: #257, 231, 218, 215, 201, 145, 129, 109, but for some scattered coding, you want to build an external NPM package to solve it, right? (#253)
  3. There are issues might be related to the bottom layer of node or Angular: 230, 213, 118, I think we might not be able to solve them.
  4. Features, such as uint8array, eg #253 mentioned above.

what do u think?

For the second point, about #253, I think this issue is of great significance. (In case you are distracted by multiple messages, I will not leave messages at 253.) If I can decouple an independent NPM package, many related coding problems can be solved. And completing 253 won't affect your next release.

Initially, I'll expose two interfaces (or methods) that allow iconv calls and adds encoding,

Create a test harness that would help authors make sure their codecs work on the whole range of supported environments.

I'm not particularly good at reporting this, but I don't think testing is a particularly important work in the early stage.

Recently, I have plenty of free time, I'm keen to contribute for this great project, I learned a lot of knowledge about coding and made related notes which i think is great. I don't know when will you release the next version, I'd also love to write UTs and contribute on others besides assist solving the current issues.

keidarcy commented 2 years ago

Hope we have this callback to handle invalid characters