Garethp / RikaiRebuilt

MIT License
12 stars 1 forks source link

Some jpod101 audio clips can't be found by checking reading-spelling pairs like the code currently does #6

Closed wareya closed 4 years ago

wareya commented 5 years ago

Consider this a mini-PSA to people working on JMDICT-based dictionaries. Yomichan's dev is MIA and rikaichamp doesn't support audio (I think?), so I'll post this here.

Somehow, a couple of the jpod101 audio files are stored wrong on their servers, with names that have things like (ik) or (oK) in them. I noticed this when I was bouncing all of the /EntL.....X/ words in edict2 against their asset server (because I wanted a list instead of brute forcing it whenever the user tries to play audio) but screwed up and forgot to remove the metadata; some of them actually went through with metadata. I decided to try every combination of with-and-without metadata for every /...X/ word. Here's a list:

ありがと(ik);ありがと(ik),ありがと
あんまし;按摩師(oK),あんまし;按摩師
あんまさん;按摩さん(oK),あんまさん;按摩さん
インドシナ;印度支那(ateji),インドシナ;印度支那
おわいや;汚穢屋(oK),おわいや;汚穢屋
カタル;加答児(ateji),カタル;加答児
とそつてん;兜率天(oK),とそつてん;兜率天
カンブリア;寒武利亜(ateji),カンブリア;寒武利亜
たへる(ok);堪へる,たへる;堪へる
もとずく(ik);基ずく,もとずく;基ずく
きゅうらい;救癩(oK),きゅうらい;救癩

Left of the , is the kana= kanji= pair that'll find the audio file, on the right is either the reading;kanji pair or just the reading (if there's no kanji).

Garethp commented 4 years ago

Thanks, I've fixed this. Have you found any more of them? Do you have a process for finding missing audio?

wareya commented 4 years ago

I haven't ever checked again and I strongly doubt that they've ever changed their filenames.