dper / kanjiforanki

Takes a list of kanji and generates Anki flash cards for each them.
MIT License
11 stars 0 forks source link

Filter input better #5

Closed dper closed 10 years ago

dper commented 10 years ago

The input should be filtered more. Only kanji are desired. All other characters should be removed.

See http://www.ruby-doc.org/core-1.9.3/Regexp.html for details on regular expressions in Ruby.

dper commented 10 years ago

If we're thinking of this in an exclusive sense (i.e., remove all of the following and the remainder should be OK, probably), then things to remove are: /[[:ascii:]]/ , /[[:blank:]]/, /[[:cntrl:]]/, /[[:punct:]]/.

At the moment, this is implemented.

dper commented 10 years ago

Evan asks whether I can look up the characters by "code plane".

dper commented 10 years ago

Right now, if characters fit through the above described filter, it's not a problem. Any characters that aren't in kanjidic2 are ignored. That means the only actually problematic entries are things like Japanese punctuation symbols.

dper commented 10 years ago

Given the current conditions, the number of problematic characters is very low. Even if a few problematic characters slip through the cracks and get into a deck, they can be removed manually.

If this ends up being more serious than I think it is, we can reopen the issue.