chaklim / hkscs_unicode_converter

Convert Unicode characters to HKSCS-2016
MIT License
6 stars 2 forks source link

Correct conversions #12

Open nathanhammond opened 8 months ago

nathanhammond commented 8 months ago

This PR includes patches for three issues that presented themselves when converting JPTableFull (using HKSCS encoding) to present Unicode codepoints.

The issues are resolved commit-by-commit:

  1. Reverts the precomputation of the data files. This should be done as a build action and should not be pre-computed within the repo. Further, the approach should pre-resolve all possible inputs to their outputs as a single map: {"5041": "5041"}. Data beyond that KV pair is not valuable.
  2. HKSCS-9447 has a typo in the source mapping, resulting in a duplicate key. Correcting that mapping corrects the output.
  3. Big5 has two duplicate characters. They both get mapped into Unicode code points. This change makes the conversion to the preferred Unicode character, not the compat character.
  4. The numerals now get correctly mapped to the ideographs, not the hangzhou numerals.

Fixes #4.