Open postkevone opened 3 years ago
Hi kebifurai,
thanks for the input. I like the idea. For the moment I'll make some notes here on what would need to happen to implement the feature.
wadoku_pitchdb.csv
, update user_pitchdb.csv
If you have any input for any of the above, don't hesitate to let me know. (:
Thank you for your reply.
After exploring the XLM dump a bit I found out that those vowels are preceded by [Dev]
<hatsuon>[Dev]しゅく'じつ</hatsuon>
<hatsuon>た[Dev]すけ</hatsuon>
<hatsuon>く'[Dev]ちく</hatsuon>
<hatsuon>[Dev]き・さま</hatsuon>
I'd really appreciate the feature as well. To visually convey these vowels, I feel like circling the vowel either with a solid or dashed stroke would be appropriate. That's how Japanese do : ㋜, ㋛, ㋡, ㋗, ㋠, ㋖, ㋪, ㋫, etc... (they usually write pronunciations with katakana).
btw, love your add-on
Thanks for the input!
One key problem is see with circling is that しゅ is a common candidate for having a barely pronounced vowel. So with e.g. 祝福
both しゅ and ふ would need a circle. "Circling" しゅ completely would result in some kind of oval, while only circling in the し would be hard w/o crossing over the ゅ.
Looking at how Wadoku does it, they additionally show the pronunciation in rōmaji and grey out the vowel. I feel greying out is kind of intuitive for "barely pronounced", but for the add-on accent visualization + kana + rōmaji would be a bit noisy.
Yeah, true. Though why not greying out kana ?
If not possible, circling the circle on the pitch accent graph doesn't seem a bad solution either to me.
circling the circle
I feel that wouldn't be very intuitive.
why not greying out kana
I played around with greying out the kana and circle. Example:
Does look okay I think. Not 100% clean because it's acually only vowel part that's barely pronounced but well ... an okay compromize I guess.
A bit more subtle and hinting at only the vowel part being barely pronounced would maybe be to grey out the right part of the circle.
Thoughts?
@kebifurai do you have a link to the NHK app that's using the dashed circles? Or even better maybe some resource (website/book/...) discussing/explaining that kind of notation? If there is some sort of conventional way to denote barely pronounced vowels in Japanese I'd prefer to take inspiration from that. (Side note: considering to switch to katakana given @TheScientist14 pointed out it's common and the 大辞林 I point to in the README does so).
If you decide to use katakana, you could use the chars that I sent in my first comment. Here is the list of every katakana which exists with a circle as a char (src) :
㋕, ㋖, ㋗, ㋘, ㋙
㋚, ㋛, ㋜, ㋝, ㋞
㋟, ㋠, ㋡, ㋢, ㋣
㋤, ㋥, ㋦, ㋧, ㋨
㋩, ㋪, ㋫, ㋬, ㋭
㋮, ㋯, ㋰, ㋱, ㋲
㋳, , ㋴, , ㋵
㋶, ㋷, ㋸, ㋹, ㋺
㋻, ㋼, , ㋽, ㋾
Side note : not every vowel in this list are usable, only vowels ending with 'u' or 'i' can be silenced. Idk why the other ones exist...
For every other vowel that is not in this list, I suggest to surround it with parenthesis this way :
(シュ)、(フィ)、(プ)、(ピ)
Actually, greying out the kana and circle feels good to me.
N.B : It appears that only キ、ク、シ、シュ、ス、チ、ツ、ヒ、フ、フィ、ピ、プ can be devoiced. (src)
I believe the NHK @kebifurai has quoted is in this app, not sure though.
@kebifurai do you have a link to the NHK app that's using the dashed circles? Or even better maybe some resource (website/book/...) discussing/explaining that kind of notation? If there is some sort of conventional way to denote barely pronounced vowels in Japanese I'd prefer to take inspiration from that. (Side note: considering to switch to katakana given @TheScientist14 pointed out it's common and the 大辞林 I point to in the README does so).
Unfortunately the app is paid and only for iOS: https://www.monokakido.jp/ja/dictionaries/nhkaccent2/index.html
You can also give a look at this anki addon: https://ankiweb.net/shared/info/1225470483 Here you can see a configuration similar to the one used in the NHK dictionary: https://tatsumoto-ren.github.io/blog/useful-anki-add-ons-for-japanese.html#japitch
I played around with greying out the kana and circle. Example:
Does look okay I think. Not 100% clean because it's acually only vowel part that's barely pronounced but well ... an okay compromize I guess.
Honestly I like this idea a lot. It's similar to what the people running suzuki kun do so I would support this as a solution (maybe with a slightly lighter shade of gray for the circles). My only question is what the manual-entry syntax would look like. Do you think it would make sense to just do this with upper case vs. lower case letters? E.g.
"H" = high + voiced
"h" = high + devoiced
"L" = low + voiced
"l" = low + devoiced
I played around with greying out the kana and circle. Example: Does look okay I think. Not 100% clean because it's acually only vowel part that's barely pronounced but well ... an okay compromize I guess.
Honestly I like this idea a lot. It's similar to what the people running suzuki kun do so I would support this as a solution (maybe with a slightly lighter shade of gray for the circles). My only question is what the manual-entry syntax would look like. Do you think it would make sense to just do this with upper case vs. lower case letters? E.g.
"H" = high + voiced "h" = high + devoiced "L" = low + voiced "l" = low + devoiced
Imo, it doesn't need to be indicated in the manual-entry. If you really want to, maybe you could surround it with parenthesis ? Like so : (H) L But, to me, it is not related to the pitch.
Hi, sorry, I didn't see that you'd replied to this. Which is too bad since you were so prompt!! Apologies!!!
But, to me, it is not related to the pitch I guess it might not be directly related to pitch, but I feel like the point of this tool is to help people hone their pronunciation to be closer to that of a native speaker, and devoicing is an important part of that. So I think it makes sense to include as a feature.
In my mind the alternative is to have two separate tools with which to practice each. I can't think of a good reason to do that instead of practicing both at the same time.
Imo, it doesn't need to be indicated in the manual-entry.
The textbook I'm using frequently has words or phrases that don't play well with the automation script. This happens maybe ~30-50% of the time. Also, there are a handful of words for which the automation script appears to get pitch information that doesn't match that of my textbook. In these cases I look up pitch accent + devoicing information manually and enter it. Since this happens so frequently I think it's a reasonable feature to add.
If you really want to, maybe you could surround it with parenthesis ?
Sure, I'd be fine with that!
alright how about something like this
I also modified the code to (a) ignore characters in the pitch pattern string past 1 + number of mora, and (b) write the kana / pitch pattern / pitch accent image to fields in the card, since I have to do a lot of manual pitch entries in my use case and was getting a bit annoyed that I had to re-enter the whole pitch pattern and reading from scratch whenever I'd make a small mistake in one spot.
Looks something like this: link
Haven't tested it with the batch processing mode
On wadoku is possible to see when a certain mora's vowel is not pronounced
In both cases the "u" is not pronounced.
On the NHK dictionary those moras are shown as below:
It would be great if you could add this feature in your addon, making the pitch accent more accurate.