UtaUtaUtau / nnsvslabeling

Python scripts I made to make NNSVS labeling easier.
MIT License
23 stars 9 forks source link

Japanese language detection is inaccurate #2

Closed ghost closed 3 years ago

ghost commented 3 years ago

Japanese datasets are incorrectly detected as another language if capitalized phonemes exist. For NNSVS Japanese datasets (and Japanese speech datasets in general, somewhat), capitalized vowel phonemes represent unvoiced vowels.

May I suggest updating the detection?

I don't know python so I replaced your detection with this out of desperation.

In case someone comes along before this issue is resolved and needs a quick 'fix'. ```python label_lang = input('Is dataset Japanese? [y/n] ') if label_lang.lower() == 'y': jpn = True else: jpn = False ```

May I also suggest adding the capitalized phonemes to the vowel list and change line 129 to something like not in ('pau', 'sil') else 'R' since pau and sil are used interchangeably?

Thank you.

UtaUtaUtau commented 3 years ago

Sorry again for the crude fix! I added more to the README on why Japanese is separated.