IllDepence / anki_add_pitch_plugin

Anki addon to automatically add pitch accent information to cards.
https://ankiweb.net/shared/info/148002038
MIT License
32 stars 10 forks source link

Add support for barely pronounced vowels #12

Open postkevone opened 3 years ago

postkevone commented 3 years ago

On wadoku is possible to see when a certain mora's vowel is not pronounced

Screenshot 2021-04-22 at 6 14 02 PM

Screenshot 2021-04-22 at 6 13 52 PM In both cases the "u" is not pronounced.

On the NHK dictionary those moras are shown as below: 5240a1ba-01b8-4121-a685-e0729315a417

It would be great if you could add this feature in your addon, making the pitch accent more accurate.

IllDepence commented 3 years ago

Hi kebifurai,

thanks for the input. I like the idea. For the moment I'll make some notes here on what would need to happen to implement the feature.

If you have any input for any of the above, don't hesitate to let me know. (:

postkevone commented 3 years ago

Thank you for your reply.

After exploring the XLM dump a bit I found out that those vowels are preceded by [Dev]

<hatsuon>[Dev]しゅく'じつ</hatsuon>

<hatsuon>た[Dev]すけ</hatsuon>

<hatsuon>く'[Dev]ちく</hatsuon>

<hatsuon>[Dev]き・さま</hatsuon>
TheScientist14 commented 3 years ago

I'd really appreciate the feature as well. To visually convey these vowels, I feel like circling the vowel either with a solid or dashed stroke would be appropriate. That's how Japanese do : ㋜, ㋛, ㋡, ㋗, ㋠, ㋖, ㋪, ㋫, etc... (they usually write pronunciations with katakana).

btw, love your add-on

IllDepence commented 3 years ago

Thanks for the input!

One key problem is see with circling is that しゅ is a common candidate for having a barely pronounced vowel. So with e.g. 祝福

skfk

both しゅ and ふ would need a circle. "Circling" しゅ completely would result in some kind of oval, while only circling in the し would be hard w/o crossing over the ゅ.

Looking at how Wadoku does it, they additionally show the pronunciation in rōmaji and grey out the vowel. I feel greying out is kind of intuitive for "barely pronounced", but for the add-on accent visualization + kana + rōmaji would be a bit noisy.

TheScientist14 commented 3 years ago

Yeah, true. Though why not greying out kana ?

TheScientist14 commented 3 years ago

If not possible, circling the circle on the pitch accent graph doesn't seem a bad solution either to me.

IllDepence commented 3 years ago

circling the circle

I feel that wouldn't be very intuitive.

why not greying out kana

I played around with greying out the kana and circle. Example:

skfk

Does look okay I think. Not 100% clean because it's acually only vowel part that's barely pronounced but well ... an okay compromize I guess.

A bit more subtle and hinting at only the vowel part being barely pronounced would maybe be to grey out the right part of the circle.

skfk_half

Thoughts?


@kebifurai do you have a link to the NHK app that's using the dashed circles? Or even better maybe some resource (website/book/...) discussing/explaining that kind of notation? If there is some sort of conventional way to denote barely pronounced vowels in Japanese I'd prefer to take inspiration from that. (Side note: considering to switch to katakana given @TheScientist14 pointed out it's common and the 大辞林 I point to in the README does so).

TheScientist14 commented 3 years ago

If you decide to use katakana, you could use the chars that I sent in my first comment. Here is the list of every katakana which exists with a circle as a char (src) :

㋕, ㋖, ㋗, ㋘, ㋙
㋚, ㋛, ㋜, ㋝, ㋞
㋟, ㋠, ㋡, ㋢, ㋣
㋤, ㋥, ㋦, ㋧, ㋨
㋩, ㋪, ㋫, ㋬, ㋭
㋮, ㋯, ㋰, ㋱, ㋲
㋳,   , ㋴,   , ㋵
㋶, ㋷, ㋸, ㋹, ㋺
㋻, ㋼,   , ㋽, ㋾

Side note : not every vowel in this list are usable, only vowels ending with 'u' or 'i' can be silenced. Idk why the other ones exist... For every other vowel that is not in this list, I suggest to surround it with parenthesis this way : (シュ)、(フィ)、(プ)、(ピ)

Actually, greying out the kana and circle feels good to me.

N.B : It appears that only キ、ク、シ、シュ、ス、チ、ツ、ヒ、フ、フィ、ピ、プ can be devoiced. (src)

I believe the NHK @kebifurai has quoted is in this app, not sure though.

postkevone commented 3 years ago

@kebifurai do you have a link to the NHK app that's using the dashed circles? Or even better maybe some resource (website/book/...) discussing/explaining that kind of notation? If there is some sort of conventional way to denote barely pronounced vowels in Japanese I'd prefer to take inspiration from that. (Side note: considering to switch to katakana given @TheScientist14 pointed out it's common and the 大辞林 I point to in the README does so).

Unfortunately the app is paid and only for iOS: https://www.monokakido.jp/ja/dictionaries/nhkaccent2/index.html

You can also give a look at this anki addon: https://ankiweb.net/shared/info/1225470483 Here you can see a configuration similar to the one used in the NHK dictionary: https://tatsumoto-ren.github.io/blog/useful-anki-add-ons-for-japanese.html#japitch

redpanda1234 commented 2 years ago

I played around with greying out the kana and circle. Example:

skfk

Does look okay I think. Not 100% clean because it's acually only vowel part that's barely pronounced but well ... an okay compromize I guess.

Honestly I like this idea a lot. It's similar to what the people running suzuki kun do so I would support this as a solution (maybe with a slightly lighter shade of gray for the circles). My only question is what the manual-entry syntax would look like. Do you think it would make sense to just do this with upper case vs. lower case letters? E.g.

"H" = high + voiced 
"h" = high + devoiced
"L" = low + voiced 
"l" = low + devoiced 
TheScientist14 commented 2 years ago

I played around with greying out the kana and circle. Example: skfk Does look okay I think. Not 100% clean because it's acually only vowel part that's barely pronounced but well ... an okay compromize I guess.

Honestly I like this idea a lot. It's similar to what the people running suzuki kun do so I would support this as a solution (maybe with a slightly lighter shade of gray for the circles). My only question is what the manual-entry syntax would look like. Do you think it would make sense to just do this with upper case vs. lower case letters? E.g.

"H" = high + voiced 
"h" = high + devoiced
"L" = low + voiced 
"l" = low + devoiced 

Imo, it doesn't need to be indicated in the manual-entry. If you really want to, maybe you could surround it with parenthesis ? Like so : (H) L But, to me, it is not related to the pitch.

redpanda1234 commented 2 years ago

Hi, sorry, I didn't see that you'd replied to this. Which is too bad since you were so prompt!! Apologies!!!

But, to me, it is not related to the pitch I guess it might not be directly related to pitch, but I feel like the point of this tool is to help people hone their pronunciation to be closer to that of a native speaker, and devoicing is an important part of that. So I think it makes sense to include as a feature.

In my mind the alternative is to have two separate tools with which to practice each. I can't think of a good reason to do that instead of practicing both at the same time.

Imo, it doesn't need to be indicated in the manual-entry.

The textbook I'm using frequently has words or phrases that don't play well with the automation script. This happens maybe ~30-50% of the time. Also, there are a handful of words for which the automation script appears to get pitch information that doesn't match that of my textbook. In these cases I look up pitch accent + devoicing information manually and enter it. Since this happens so frequently I think it's a reasonable feature to add.

If you really want to, maybe you could surround it with parenthesis ?

Sure, I'd be fine with that!

redpanda1234 commented 2 years ago

pitch-accent

alright how about something like this

redpanda1234 commented 2 years ago

I also modified the code to (a) ignore characters in the pitch pattern string past 1 + number of mora, and (b) write the kana / pitch pattern / pitch accent image to fields in the card, since I have to do a lot of manual pitch entries in my use case and was getting a bit annoyed that I had to re-enter the whole pitch pattern and reading from scratch whenever I'd make a small mistake in one spot.

Looks something like this: link

Haven't tested it with the batch processing mode