danielpclark / language_cards

Command line flash card game for learning languages. MVC, I18n, and YAML based. Japanese & Chinese flash cards available.
MIT License
23 stars 3 forks source link

Foreign characters may cause misalignment in UI #10

Open danielpclark opened 7 years ago

danielpclark commented 7 years ago

When other locales are added it's important to note that not all ascii characters are the same width. Japanese characters have double width in most (maybe all) cases which will cause the right aligned text to be off by that extra amount. Figure a way to calculate actual character width for alignment.

Create tests with the new alignment logic by demonstrating a string with a word of characters that use more than the standard width and test that the alignment should be with the methods output of that word. So for a given character display width for the output be able to align and provide balance between left, center, and right without exceeding the display width from unusually sized characters.

danielpclark commented 7 years ago

This gem unicode-display_width may be a possible solution, but the "marshall" data may be incompatible across Ruby versions.

This Rust crate unicode_width would work well with a Ruby gem wrapped around it.

This S.O. answer has Japanese related ordinal info: http://stackoverflow.com/a/15651264/1500195

Excerpt of that:

# -*- coding: utf-8 -*-

def is_halfwidth_katakana(c)
  return (c.ord >= 0xff61 and c.ord <= 0xff9f)
end

def is_fullwidth_katakana(c)
  return (c.ord >= 0x30a0 and c.ord <= 0x30ff)
end

def is_halfwidth_roman(c)
  return (c.ord >= 0x21 and c.ord <= 0x7e)
end

def is_fullwidth_roman(c)
  return (c.ord >= 0xff01 and c.ord <= 0xff60)
end

def is_hiragana(c)
  return (c.ord >= 0x3041 and c.ord <= 0x309f)
end

def is_kanji(c)
  return (c.ord >= 0x4e00 and c.ord <= 0x9fcc)
end

text = "Hello World、こんにちは、半角カタカナ、全角カタカナ、fullwidth 0-9\n"

text.split("").each do |c|
  if is_halfwidth_katakana(c)
    type = "halfwidth katakana"
  elsif is_fullwidth_katakana(c)
    type = "fullwidth katakana"
  elsif is_halfwidth_roman(c)
    type = "halfwidth roman"
  elsif is_fullwidth_roman(c)
    type = "fullwidth roman"
  elsif is_hiragana(c)
    type = "hiragana"
  elsif is_kanji(c)
    type = "kanji"
  end

  printf("%c (%x) %s\n",c,c.ord,type)
end

From a quick check standard characters of width 1 are often 1 byte, but characters of width 2 are 3 bytes.