Open kaoala7577 opened 4 years ago
Hi, this actually is the best place for feedback. :)
Your suspiction is correct. The addon does not process reading fields in the form kanji[kana]
as desired (I'll quickly explain the reasons for this down below). As for making addon compatible with this reading field format, I unfortunately wouldn't have the time to properly test any changes. But I'll keep this issue open as a reminder for myself and invitation to anyone else who might be interested in contributing the necessary changes.
Details
汚れ
, the addon can't know whether the user intended the card as けがれ
or よごれ
. Therefore, the addon also looks at the reading field, and if it finds either of the readings, it goes with that one.kanji[kana]
format, is that when the reading field is processed, its contents first run throug a cleaning function, which removes everything in brackets. The reason for removing everything in brackets is that I, personally, sometimes add extra stuff in brackets to the reading field—e.g. I'll have an expression field 足手まとい
and a reading field あしてまとい (also: 足手纏い)
, to just add the extra bit of info for me to see afer every recognition and recall of the card, w/o requiting me to be able to recognize 纏
for the actual review.歳[さい]
the cleaning function returns just 歳
which doesn't help the addon disambiguate between possible readings.kanji[kana]
format, I'd imagine something like this: after selecting the reading field (user is asked "'Which field contains the reading?'"), the plugin could go through a sample of reading fields in the selected deck and, if all of them adhere to a <kanji or kana unicode range>[<kana unicode range>]
type of pattern, the user could be asked if they're using this type of reading field and whether or not the reading field processing should be done accordingly (e.g. "It looks like you're reading fields are in <expression>[<furigana>]
format. Would you like the addon to use the <furigana>
part to disambiguate readings?").unfortunately i don't know how to code in python so my attempts at doing this myself haven't worked, but would it be possible to make an alternate clean function that japanese support users could add manually? i tried duplicating the function and just changing the s = re.sub(r'[\[\(\{][^\]\)\}]*[\]\)\}]', '', s)
line with something i found on google but it didn't work so i don't know if there are more checks involved somewhere that i didn't notice.
this would probably break the addon for people not using this format so it would have to be a manual thing, but changing one function/adding one sounds like a lot less hassle on your part than sampling all of the reading fields to automate it. alternatively there could be a checkbox so someone could state that they're using that format without any checks, but i don't know how difficult reformatting the selection screen would be.
thank you for your reply, you've been really helpful! i hoped to simplify the problem a bit by making users add the code but i might have made it a whole lot harder for you lol, what do you think?
If you're fine with changing the addon's code on your end, here's what should work:
Replace the line
reading_field = clean(reading_field)
with
furigana_match = re.search(r'\[([^\]]*)\]', reading_field)
if furigana_match:
reading_field = furigana_match.group(1)
Hope this helps. :)
hm, doesn't seem to be working on my end... not sure what the issue would be, any idea? i can try removing and adding japanese support to see if there's an issue with the config somehow
oh and if it helps, this is the code i was using previously when i was messing about myself.
under the def clear(s):
bit:
def furigana(s):
s = stripHTML(s)
s = re.search("\[(.*?)\]", s).group(1)
return s.strip()
and then i replaced reading_field = clean(reading_field)
with reading_field = furigana(reading_field)
this also didn't work, but i hope it helps somewhat? thanks so much, sorry for all this trouble haha
Your code as well as the one I suggested does, given an input a[b]
, return b
. May I ask how you're testing if it's working or not?
If it's by looking at the output for your 歳[さい]
card, I fear code changes won't do anything. The pitch accent dictionary that comes with the addon only contains 歳
as an alternative writing to 年
:
年␟△歳␟年␟とし␟歳␞とし␞とし␞2␞LHL
You could test the code changes by creating two cards with 行
as the expression, one with 行[ぎょう]
as the reading and another with 行[こう]
, and then see if you get an output as expected.
sorry i didn't see the notification for this! i was trying it using 年 since i figured 歳 might be an outlier but neither seemed to work... since さい isn't a reading in the pitch accent dictionary, do you know how i would add it or is it a lost cause?
oh! i finally got around to deleting and reinstalling both japanese support and japanese pitch, changed the code, and it's suddenly working? the config on japanese support is the same and nothing about the code is different from before... that's so weird. i'm still interested in any advice on how to fix the さい problem but that's that issue fixed at least
Glad to hear it worked out.
If you know what the correct pitch accent pattern for 歳[さい] is, you can manually change it.
/edit: @kaoala7577 I just updated the plugin and it now supports editing cards' annotations individually. :)
e.g. I'll have an expression field 足手まとい and a reading field あしてまとい (also: 足手纏い), to just add the extra bit of info for me to see afer every recognition and recall of the card, w/o requiting me to be able to recognize 纏 for the actual review.
Not saying you should use your own add-on differently, but more simple and structural way would be to have an alternative way of writing as a separate field. This would not only allow easier parsing, as it comes to be an issue for kana[furigana] format, but also enable more flexible representation. For example, for homophones, I use kanji form as a hint (not shown initially on a front, but can be shown by user request) for cards with reading-only on the front, and for homographs I use meaning as a hint for kanji-only cards. This is easy to do, since the note has meaning and reading as separate field. And I don't need to do edits for any new card. In this case it would be a field with alternative ways of writing.
Sure. If I'd start over creating by Japanese vocab deck I'd probably consider strict field contents for the sake of processability. E.g. having fields like phrase
, phrase extra info
, meaning
, meaning extra info
, reading
, pitch accent
, example sentence(s)
, image(s)
.
As it stands though, I have a deck that grew over the years. (:
/ninja edit:
In any case, having special handling of certain card formats of any kind be the default will probably never make everyone happy. Given that the Japanese Support add-on's format can be assumed to be somewhat common, I'd welcome contributions towards some heuristic detection and user prompt as outlined in my first comment.
One thing worth noting: Japanese Support isn't a plugin. It is built-in into Anki now. I'll look into code more thoroughly and will come up with a pull request as soon as I'll have time.
Here's some thoughts on how it can be approached:
As for initial inspection of the whole deck, I think it is a good idea, just a few things to note:
So I'd suggest to check whether ANY card in a deck has Japanese Support format and then offer user an option to proceed or cancel. Optionally, user may be advised to create a filtered deck and do pitch accent processing there, to skip processing non-compatible cards.
Sounds good.
One thing worth noting: Japanese Support isn't a plugin. It is built-in into Anki now.
Good to know. I wasn't aware of that.
Note to self:
While developing automated heuristic determination of a note’s format (#35), just matching for the first continuous block of Japanese characters in a field seemed like a reasonable way to identify an expression. This could also be an elegant alternative to the current heuristic cleaning of the expression field.
TODO: test on own collection an see if it leads to an improvement or not.
hi! sorry if this is the wrong place to contact you, i couldn't find an email. so, im using the japanese support addon (https://ankiweb.net/shared/info/3918629684) and was wondering if you would ever think about making the two compatible? when japanese support adds furigana to an expression it shows it as kanji[kana] - however i'm not sure your pitch addon can read it, since when 歳[さい] is given as the reading input, it comes out with the pitch accent for とし and not さい. japanese support is a very helpful addon and many people use it so i'd be reluctant to use another one, but if you know of an addon that generates the furigana reading just in kana instead as a workaround, i'd be willing to switch to using that instead this isn't urgent or anything since it usually gets the right reading anyway, but wanted to ask the question just in case. thanks!