akira-kurogane / furigana-injector

Automatically exported from code.google.com/p/furigana-injector
3 stars 1 forks source link

Fail on 方花(ほうげ). Chrome. #53

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Click on the extension icon to add furigana over 方花 (example : here 
http://ja.wikipedia.org/wiki/%E6%9D%B1%E6%96%B9%E8%8A%B1%E6%98%A0%E5%A1%9A_%E3%8
0%9C_Phantasmagoria_of_Flower_View. or here 
http://translate.google.com/#ja|ja|%E6%96%B9%E8%8A%B1)
2. Click on the icon once more to remove the furigana
3. Click on the icon a third time to put the furigana back over the kanji

What is the expected output? What do you see instead?
The first time the furigana over 方花 is ほうげ but the second it is 
ほうはな. (In Firefox, it is always ほうげ, which is a consistent 
answer).

Original issue reported on code.google.com by akira%ya...@gtempaccount.com on 25 Dec 2010 at 4:24

GoogleCodeExporter commented 9 years ago
Just to save time, "東方花映塚" is the name of a computer game. The best 
word-boundary detection that could conceivably done using standard dictionaries 
would be 東方.花.映.塚.

Original comment by akira%ya...@gtempaccount.com on 25 Dec 2010 at 4:30

GoogleCodeExporter commented 9 years ago
It's no surprise that ほう is not being displayed, if on chrome 方 is one of 
the user's exclusion kanji (and it probably would be, being a basic and common 
kanji).

I can't reproduce just by viewing the pages but I could trick it to do はな 
(on both Chrome and Firefox) by putting a div break between the 方 kanji and 
花 kanji. This is because FuriganaInjector assumes new divs imply new text 
blocks, and the two will be processed in Mecab separately. If 花 occurs at the 
beginning of a sentence and is not the head of dictionary word such as 花粉 
then Mecab will assume it's the single-kanji noun, i.e. read with the kunyomi 
はな.

Original comment by akira%ya...@gtempaccount.com on 25 Dec 2010 at 4:45

GoogleCodeExporter commented 9 years ago
Firstly, thank you a lot for fixing everything I have reported so far.

Regarding this particular issue, it indeed seems like it only happens if 方 is 
not on the "kanji to exclude" list (I actually emptied the list when I first 
installed the extension and forgot about it). The problem is not the wrong 
reading (I wouldn't expect a perfect reading every time for elements of 
fiction/book titles etc.) but the fact that the reading is different the first 
and the second time (from ほうげ to ほうはな)

I had someone else test this (with the kanji not excluded) and he could 
reproduce the bug.

Original comment by tvan...@googlemail.com on 31 Dec 2010 at 5:11