Closed dper closed 10 years ago
If we're thinking of this in an exclusive sense (i.e., remove all of the following and the remainder should be OK, probably), then things to remove are: /[[:ascii:]]/
, /[[:blank:]]/
, /[[:cntrl:]]/
, /[[:punct:]]/
.
At the moment, this is implemented.
Evan asks whether I can look up the characters by "code plane".
Right now, if characters fit through the above described filter, it's not a problem. Any characters that aren't in kanjidic2
are ignored. That means the only actually problematic entries are things like Japanese punctuation symbols.
Given the current conditions, the number of problematic characters is very low. Even if a few problematic characters slip through the cracks and get into a deck, they can be removed manually.
If this ends up being more serious than I think it is, we can reopen the issue.
The input should be filtered more. Only kanji are desired. All other characters should be removed.
See http://www.ruby-doc.org/core-1.9.3/Regexp.html for details on regular expressions in Ruby.