japanese support compatibility

kaoala7577 commented 4 years ago

hi! sorry if this is the wrong place to contact you, i couldn't find an email. so, im using the japanese support addon (https://ankiweb.net/shared/info/3918629684) and was wondering if you would ever think about making the two compatible? when japanese support adds furigana to an expression it shows it as kanji[kana] - however i'm not sure your pitch addon can read it, since when 歳[さい] is given as the reading input, it comes out with the pitch accent for とし and not さい. japanese support is a very helpful addon and many people use it so i'd be reluctant to use another one, but if you know of an addon that generates the furigana reading just in kana instead as a workaround, i'd be willing to switch to using that instead this isn't urgent or anything since it usually gets the right reading anyway, but wanted to ask the question just in case. thanks!

IllDepence commented 4 years ago

Hi, this actually is the best place for feedback. :‌‌)

Your suspiction is correct. The addon does not process reading fields in the form kanji[kana] as desired (I'll quickly explain the reasons for this down below). As for making addon compatible with this reading field format, I unfortunately wouldn't have the time to properly test any changes. But I'll keep this issue open as a reminder for myself and invitation to anyone else who might be interested in contributing the necessary changes.

Details

The reading field is currently used to disambiguate between different possible readings. Example: for an expression field 汚れ, the addon can't know whether the user intended the card as けがれ or よごれ. Therefore, the addon also looks at the reading field, and if it finds either of the readings, it goes with that one.
The reason this is not working with the kanji[kana] format, is that when the reading field is processed, its contents first run throug a cleaning function, which removes everything in brackets. The reason for removing everything in brackets is that I, personally, sometimes add extra stuff in brackets to the reading field—e.g. I'll have an expression field 足手まとい and a reading field あしてまとい　(also: 足手纏い), to just add the extra bit of info for me to see afer every recognition and recall of the card, w/o requiting me to be able to recognize 纏 for the actual review.
What this removal means is that for e.g. 歳[さい] the cleaning function returns just 歳 which doesn't help the addon disambiguate between possible readings.
If someone wanted to contribute code for making the addon compatible with the kanji[kana] format, I'd imagine something like this: after selecting the reading field (user is asked "'Which field contains the reading?'"), the plugin could go through a sample of reading fields in the selected deck and, if all of them adhere to a <kanji or kana unicode range>[<kana unicode range>] type of pattern, the user could be asked if they're using this type of reading field and whether or not the reading field processing should be done accordingly (e.g. "It looks like you're reading fields are in <expression>[<furigana>] format. Would you like the addon to use the <furigana> part to disambiguate readings?").

kaoala7577 commented 4 years ago

unfortunately i don't know how to code in python so my attempts at doing this myself haven't worked, but would it be possible to make an alternate clean function that japanese support users could add manually? i tried duplicating the function and just changing the s = re.sub(r'[\[\(\{][^\]\)\}]*[\]\)\}]', '', s) line with something i found on google but it didn't work so i don't know if there are more checks involved somewhere that i didn't notice. this would probably break the addon for people not using this format so it would have to be a manual thing, but changing one function/adding one sounds like a lot less hassle on your part than sampling all of the reading fields to automate it. alternatively there could be a checkbox so someone could state that they're using that format without any checks, but i don't know how difficult reformatting the selection screen would be. thank you for your reply, you've been really helpful! i hoped to simplify the problem a bit by making users add the code but i might have made it a whole lot harder for you lol, what do you think?

IllDepence commented 4 years ago

If you're fine with changing the addon's code on your end, here's what should work:

Replace the line

reading_field = clean(reading_field)

with

furigana_match = re.search(r'\[([^\]]*)\]', reading_field)
if furigana_match:
    reading_field = furigana_match.group(1)

Hope this helps. :‌‌)

kaoala7577 commented 4 years ago

hm, doesn't seem to be working on my end... not sure what the issue would be, any idea? i can try removing and adding japanese support to see if there's an issue with the config somehow

kaoala7577 commented 4 years ago

oh and if it helps, this is the code i was using previously when i was messing about myself. under the def clear(s): bit:

def furigana(s):
    s = stripHTML(s)
    s = re.search("\[(.*?)\]", s).group(1)
    return s.strip()

and then i replaced reading_field = clean(reading_field) with reading_field = furigana(reading_field) this also didn't work, but i hope it helps somewhat? thanks so much, sorry for all this trouble haha

IllDepence commented 4 years ago

Your code as well as the one I suggested does, given an input a[b], return b. May I ask how you're testing if it's working or not?

If it's by looking at the output for your 歳[さい] card, I fear code changes won't do anything. The pitch accent dictionary that comes with the addon only contains 歳 as an alternative writing to 年:

年␟△歳␟年␟とし␟歳␞とし␞とし␞2␞LHL

You could test the code changes by creating two cards with 行 as the expression, one with 行[ぎょう] as the reading and another with 行[こう], and then see if you get an output as expected.

kaoala7577 commented 4 years ago

sorry i didn't see the notification for this! i was trying it using 年 since i figured 歳 might be an outlier but neither seemed to work... since さい isn't a reading in the pitch accent dictionary, do you know how i would add it or is it a lost cause?

kaoala7577 commented 4 years ago

oh! i finally got around to deleting and reinstalling both japanese support and japanese pitch, changed the code, and it's suddenly working? the config on japanese support is the same and nothing about the code is different from before... that's so weird. i'm still interested in any advice on how to fix the さい problem but that's that issue fixed at least

IllDepence commented 4 years ago

Glad to hear it worked out.

If you know what the correct pitch accent pattern for 歳[さい] is, ~~you can manually change it.~~

/edit: @kaoala7577 I just updated the plugin and it now supports editing cards' annotations individually. :)

excaptor commented 3 years ago

e.g. I'll have an expression field 足手まとい and a reading field あしてまとい　(also: 足手纏い), to just add the extra bit of info for me to see afer every recognition and recall of the card, w/o requiting me to be able to recognize 纏 for the actual review.

Not saying you should use your own add-on differently, but more simple and structural way would be to have an alternative way of writing as a separate field. This would not only allow easier parsing, as it comes to be an issue for kana[furigana] format, but also enable more flexible representation. For example, for homophones, I use kanji form as a hint (not shown initially on a front, but can be shown by user request) for cards with reading-only on the front, and for homographs I use meaning as a hint for kanji-only cards. This is easy to do, since the note has meaning and reading as separate field. And I don't need to do edits for any new card. In this case it would be a field with alternative ways of writing.

IllDepence commented 3 years ago

Sure. If I'd start over creating by Japanese vocab deck I'd probably consider strict field contents for the sake of processability. E.g. having fields like phrase, phrase extra info, meaning, meaning extra info, reading, pitch accent, example sentence(s), image(s). As it stands though, I have a deck that grew over the years. (:

/ninja edit:

In any case, having special handling of certain card formats of any kind be the default will probably never make everyone happy. Given that the Japanese Support add-on's format can be assumed to be somewhat common, I'd welcome contributions towards some heuristic detection and user prompt as outlined in my first comment.

excaptor commented 3 years ago

One thing worth noting: Japanese Support isn't a plugin. It is built-in into Anki now. I'll look into code more thoroughly and will come up with a pull request as soon as I'll have time.

Here's some thoughts on how it can be approached:

Go over "pronunciation" field and whenever there are chunks of \w+[[:kana:]+][:kana:]* (I just made up [:kana:] class, but you got the idea), separated by the only space - replace them with a single line of kana, everything else removed.
Proceed as usual.

As for initial inspection of the whole deck, I think it is a good idea, just a few things to note:

both [:kana:]+ and [:kanji:]+[[:kana:]+][:kana:]* may be used in Japanese Support format,
user may have something not adhering to the format, but still not causing issues and hence want to do the conversion,

So I'd suggest to check whether ANY card in a deck has Japanese Support format and then offer user an option to proceed or cancel. Optionally, user may be advised to create a filtered deck and do pitch accent processing there, to skip processing non-compatible cards.

IllDepence commented 3 years ago

Sounds good.

One thing worth noting: Japanese Support isn't a plugin. It is built-in into Anki now.

Good to know. I wasn't aware of that.

IllDepence commented 1 year ago

Note to self:

While developing automated heuristic determination of a note’s format (#35), just matching for the first continuous block of Japanese characters in a field seemed like a reasonable way to identify an expression. This could also be an elegant alternative to the current heuristic cleaning of the expression field.

TODO: test on own collection an see if it leads to an improvement or not.

IllDepence / anki_add_pitch_plugin

japanese support compatibility #5