dodo / node-unicodetable

unicode lookup table
MIT License
42 stars 34 forks source link

Return the raw symbol in the output #3

Closed mathiasbynens closed 12 years ago

mathiasbynens commented 12 years ago

It would be useful to have the raw symbol in the output. For example, here’s the current output for U+1D306:

> require('unicode/category/So')[0x1d306]
{ value: '1D306',
  name: 'TETRAGRAM FOR CENTRE',
  category: 'So',
  class: '0',
  bidirectional_category: 'ON',
  mapping: '',
  decimal_digit_value: '',
  digit_value: '',
  numeric_value: '',
  mirrored: 'N',
  unicode_name: '',
  comment: '',
  uppercase_mapping: '',
  lowercase_mapping: '',
  titlecase_mapping: '' }

May I suggest adding a symbol property that contains a string with the raw symbol?

Since String.fromCharCode() only works for BMP code points, you’ll have to do some extra work. You could use Punycode.js (which is bundled with Node.js) for that:

// `String.fromCharCode` replacement that doesn’t make you enter the surrogate halves separately
punycode.ucs2.encode([0x1d306]);
punycode.ucs2.encode([119558]);
// Of course, it works for BMP code points too:
punycode.ucs2.encode([0x61]); // 'a'

Or:

var ucs2encode = require('punycode').ucs2.encode;
// …
ucs2encode([0x1d306]);
ucs2encode([119558]);
ucs2encode([0x61]); // 'a'

Edit: GitHub won’t let me post the U+1D306 symbol (or any other supplementary symbol, for that matter).

dodo commented 12 years ago

i would be happy to include this, but it looks like this is only available since node v0.8.0 :(

could you maybe include it into the npm package so older versions get it as well?

Edit: with this i mean the ucs2 support of punycode :P

mathiasbynens commented 12 years ago

Punycode.js is bundled with Node.js v0.6.2+. If you want to support older Node.js versions, use npm to install the punycode module first.

could you maybe include it into the npm package so older versions get it as well?

Not sure what you mean… The ucs2.decode method is available in the npm package. At least, it’s supposed to be! Perhaps I misunderstood?

Note: Older Punycode.js versions (like the ones in Node.js v0.6.x) had utf16.encode instead of ucs2.encode (with the exact same functionality). Easy fix:

var punycode = require('punycode'),
    ucs2encode = punycode[ punycode.ucs2 ? 'ucs2' : 'utf16' ].encode;

// or…

var punycode = require('punycode'),
    ucs2encode = (punycode.ucs2 || punycode.utf16).encode;
dodo commented 12 years ago

the note was what i'm asking for.

sorry for the confusion.

dodo commented 12 years ago

im getting some range errors:

RangeError: UCS-2(encode): illegal value 55296
RangeError: UCS-2(encode): illegal value 56191
RangeError: UCS-2(encode): illegal value 56192
RangeError: UCS-2(encode): illegal value 56319
RangeError: UCS-2(encode): illegal value 56320
RangeError: UCS-2(encode): illegal value 57343

atm, im only printing them out on install.

mathiasbynens commented 12 years ago

Ah, I forgot I made Punycode.js error on surrogates appearing individually. As this functionality is not really necessary for Punycode.js to work, I can remove get rid of it if you want.

dodo commented 12 years ago

as long as you're ok with /me catching an error, its fine i guess :)

mathiasbynens commented 12 years ago

FWIW, Punycode v1.1.0 (just published on npm) gets rid of the additional checks for orphaned surrogates, meaning you won’t need the try-catch anymore.

dodo commented 12 years ago

I ♥ Open Source