batterseapower / pinyin-toolkit

A plugin for the Anki Spaced Repetition System (http://ichi2.net/anki/)
http://batterseapower.github.com/pinyin-toolkit/
39 stars 14 forks source link

Automatic Space Trimming From Hanzi Output #96

Open Nick3C opened 15 years ago

Nick3C commented 15 years ago

To facilitate input from other Chinese language sites such as LingQ and LiveMocha that insist on (incorrectly) putting spaces into Chinese character text (their system doesn't support languages without spaces properly) we should trim spaces from the the output.

eg (from LingQ) 我 并 不是 在 担心 我 哥哥 converted to 我并不是在担心我哥哥

However, as it improves dictionary lookup accuracy (dictionary lookup will treat as a single word only), it is probably best to just do this at the final state rather that doing it before passing (i.e. it lets users force a word py putting spaces around it where there are two possibilities but still get properly formatted output (that can be pasted plain text into Expression field afterwards).

Nick3C commented 15 years ago

This is very easy to implement now that we have expression over-writing for simp/trad.

batterseapower commented 15 years ago

I'm not convinced this is the right thing to do though. I'm not very happy any time we overwrite user supplied data with our own idea of what is good. Maybe a config option to turn off space removal would be sufficient to placate me though :-)

Nick3C commented 15 years ago

This is one of those things where we save the user from themselves. It is just a bad idea to learn with spaces because when you see the phrase together you will struggle to recall it in the different format.

No objectiong to a toggle though. Perhaps we also want to allow a single space in the entry too (if it is being used to seperate something.