andrewplummer / Sugar

A Javascript library for working with native objects.
https://sugarjs.com/
MIT License
4.54k stars 305 forks source link

Hyphens and hankaku() #418

Closed DarrenCook closed 7 years ago

DarrenCook commented 10 years ago

I wanted automatic conversion of Japanese postcodes, from zenkaku to hankaku. This kind of works: value = value.hankaku('all');

It converts 1234567 to 1234567. But with 123ー4567 it does not handle the hyphen the way I expect. It turns "ー" into "ー", rather than "-" (Yeah, I know that is a bit hard to see, but I expected an ascii hyphen, and instead I get a hankaku katakana hyphen.)

I'm not sure that is something that can be fixed; perhaps we just need to clarify the documentation (which I can volunteer to do). But before touching anything I wanted to get other people's thoughts.

DarrenCook commented 10 years ago

For context, here is the jQuery-Validation method I've added for validating Japanese postcodes. It allows either xxx-xxxx or xxxxxxx format, and any mix of ascii and zenkaku:

$.validator.addMethod('jppostcode',function(value,element){
value = value.hankaku('all');    //Part of sugar.js
value = value.replace('ー','-'); //Convert hankaku katakana hyphen to ascii hyphen
$(element).val(value);  //Update field on the form
return /^\d{3}-?\d{4}$/.test(value);
},"Please enter a valid Japanese 7-digit postcode");
andrewplummer commented 10 years ago

This one is rough...

On the one hand, I know totally what you mean, and what you're trying to do, but on the other hand let's have a look at the definitions for these 2 glyphs:

ー    KATAKANA-HIRAGANA PROLONGED SOUND MARK
ー     HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK

I think it would be odd to always convert a "prolonged sound mark" to something other than a "halfwidth prolonged sound mark" as that's clearly meant to be it's equivalent, however in the context of numbers half-width numbers are identical to ascii numbers, and so I think that using an ascii hyphen is reasonable as it's clearly not intended to represent the "prolonged sound mark" in any of it's forms.

Fortunately we have a that context as a "mode" that you pass in here, so what if we did the zenkaku hypen to ascii hyphen conversion only in the context of mode "n" and not for others?

andrewplummer commented 10 years ago

Ok more information to make things more complicated... I had a quick bash at the different input methods on OSX and Windows XP (going to assume Win 7 etc is the same for now)

When inputting numerals:
XP:   -  FULLWIDTH HYPHEN-MINUS
OSX: −  MINUS SIGN

When inputting hiragana:
XP:   ー   KATAKANA-HIRAGANA PROLONGED SOUND MARK
OSX: ー  KATAKANA-HIRAGANA PROLONGED SOUND MARK

I added tests that confirm FULLWIDTH HYPHEN-MINUS is already being converted across the board. This leaves the question of whether or not PROLONGED SOUND MARK and/or MINUS SIGN should be as well.

The above shouldn't necessarily imply that different input methods should necessarily dictate (or have any relation to) what should be converted, but it is interesting....

andrewplummer commented 10 years ago

Ok I'm going ahead to make these changes. MINUS SIGN will now be converted to a hankaku hyphen in all modes and KATAKANA-HIRAGANA PROLONGED SOUND MARK will be converted to a hyphen in number mode only (or, alternately if number mode is the first in the mode list).

This should be able to achieve what you want...

DarrenCook commented 10 years ago

Hello Andrew, thanks! I've not been commenting as I'm still not sure what is correct behaviour. Your change makes it nice for postcodes. But, are there any situations where it changes the behaviour someone would expect, for the worse? (I tried, but couldn't think of any after a few minutes.)

andrewplummer commented 10 years ago

I'm trying to work this out myself. MINUS SIGN is a somewhat obscure mathematical symbol, so I suppose it's unlikely that converting it will result in any issues, unless it's actually in a mathematical formulae, but in that case why are you trying to convert into hankaku anyway? The mapping only works one way, so characters won't ever get turned into a MINUS SIGN.

The PROLONGED SOUND MARK is more tricky as it also has double duty as a fullwidth hyphen in some cases (mostly postal codes, which is the issue here). This is why I've limited it to simply number mode, and also a one way mapping, so it will only ever be converted to a hyphen when the mode is explicitly "numbers".

This should work I think but after laying it out if you can think of any possible issues please let me know.

andrewplummer commented 7 years ago

Ok this is out now, so closing!