bootphon / phonemizer

Simple text to phones converter for multiple languages
https://bootphon.github.io/phonemizer/
GNU General Public License v3.0
1.15k stars 163 forks source link

There is a difference between espeak-ng and phonemizer #122

Closed hakan-demirli closed 2 years ago

hakan-demirli commented 2 years ago

I have been using Phonemizer in a hobby project that I am currently trying to port to C++. I have read the code, and apart from input sanitization and basic pre/post-processing, there are only Espeak-ng system library bindings that access the same files C++ use. So, I can't see a reason why the outputs are different.

Phonemizer həloʊ wɜːld Espeak-ng həlˈəʊ wˈɜːld

I don't know much about phonetics. But, the only difference I see is o <-> ˈə and ɜ <-> ˈɜ. Can I just map all those differences and call it a day?

Phonemizer: Latest version Espeak-ng: Latest version OS: Pop!_OS 22.04 LTS

Python Code:

text = "hello world"
lang = 'en-us'
phonemes = phonemize(text,
                        language=lang,
                        backend='espeak',
                        strip=True,
                        preserve_punctuation=True,
                        with_stress=False,
                        njobs=4,
                        punctuation_marks=';:,.!?¡¿—…"«»“”()',
                        language_switch='remove-flags')
print(phonemes)

CPP code:

// gcc test-espeak.c -lespeak-ng -o test-espeak

#include <string.h>
#include <malloc.h>
#include <espeak-ng/speak_lib.h>

espeak_AUDIO_OUTPUT output = AUDIO_OUTPUT_SYNCHRONOUS;
char *path = NULL;
void* user_data;
unsigned int *identifier;

int main(int argc, char* argv[] ) {
  char text[] = {"hello world"};
  int buflength = 500, options = 0;
  unsigned int position = 0, position_type = 0, end_position = 0, flags = espeakCHARS_AUTO;
  espeak_Initialize(output, buflength, path, options );
  espeak_VOICE voice;
  memset(&voice, 0, sizeof(espeak_VOICE)); // Zero out the voice first
  const char *langNativeString = "en"; // Set voice by properties
  voice.languages = langNativeString;
  voice.name = "US";
  voice.variant = 2;
  voice.gender = 1;
  espeak_SetVoiceByProperties(&voice);
  espeak_SetPhonemeTrace(espeakPHONEMES_IPA,NULL);
  printf("Saying  '%s'...\n", text);
  espeak_Synth(text, buflength, position, position_type, end_position, flags, identifier, user_data);

  printf("Done\n");
  return 0;
}
mmmaat commented 2 years ago

Hi, a very quick answer. To keep the ' (this is phone accentuation) in the phonemizer output, use the with_stress=True parameter.

For the o <-> ˈə stuff I really don't know... Are you sure you are using the same library both sides?

hakan-demirli commented 2 years ago

I am sure the libraries are the same. C++ is using a CMakeLists file with an absolute path to *.so files. I have followed the phoemizer in debugger up until here: https://github.com/bootphon/phonemizer/blob/8cbdb5ade3dfaf931c63ecb3c6abf38a5b335d2f/phonemizer/backend/espeak/api.py#L223 and verified the library dependency by removing all espeak libraries and running the script again.

Thank you for explaining the accentuation. Now the outputs are looking quite close, and my project is working as expected. o <-> ə difference is not critical for me since this is a hobby project.