crow-translate / QOnlineTranslator

A library for Qt5 that provides free usage of Google, Yandex and Bing translate API.
GNU General Public License v3.0
79 stars 12 forks source link

added BrE TTS support (received pronunciation) #41

Closed Crissium closed 1 year ago

Crissium commented 1 year ago

close #40

Now, QOnlineTranslator::English is American English, and QOnlineTranslator::BritishEnglish is American English with Received Pronunciation, which means that it is spelt the Yanks' way yet pronounced with RP. E.g.: couleur -> color, and Google TTS engine pronounces it [ˈkʌlə]

Example:

QMediaPlayer player;
QMediaPlaylist list;
QOnlineTts tts;
tts.generateUrls("squirrel", QOnlineTranslator::Google, QOnlineTranslator::BritishEnglish);
list.addMedia(tts.media());
player.setPlaylist(&list);
player.play(); // [ˈskwɪrəl]
Crissium commented 1 year ago

Sorry, I got something wrong. Please do not merge until I further commit. I am really sorry about this.

Crissium commented 1 year ago

I just forgot to update the method QOnlineTranslator::languageName and clang-format files, sorry. Now I think this PR can be safely merged.

Shatur commented 1 year ago

Also will en-gb work for British translation?

Crissium commented 1 year ago

Nope. All translation engines supported by this project won't translate texts into British English, but can translate from BrE into other languages.

Shatur commented 1 year ago

All translation engines supported by this project won't translate texts into British English

So even Google won't recognize en-gb as a translation language? I just thinking that we could create an exception as we do for other languages.

Crissium commented 1 year ago

Indeed it recognises, yet won't respect -gb. Google still returns American English. For example: Google replied to curl "https://translate.googleapis.com/translate_a/single?client=gtx&ie=UTF-8&oe=UTF-8&dt=bd&dt=ex&dt=ld&dt=md&dt=rw&dt=rm&dt=ss&dt=t&dt=at&dt=qc&sl=fr&tl=en-GB&hl=en-GB&q=Quelle%20est%20la%20couleur%20de%20ta%20nouvelle%20robe%20%3F" (translate Quelle est la couleur de ta nouvelle robe ? from French into British English):

[
    [
        [
            "What's the color of your new dress?",
            "Quelle est la couleur de ta nouvelle robe ?",
            null,
            null,
            3,
            null,
            null,
            [
                []
            ],
            [
                [
                    [
                        "4df5d4d9d819b397555d03cedf085f48",
                        "fr_en_2022q1.md"
                    ]
                ]
            ]
        ]
    ],
    null,
    "fr",
    null,
    null,
    [
        [
            "Quelle est la couleur de ta nouvelle robe ?",
            null,
            [
                [
                    "What's the color of your new dress?",
                    0,
                    true,
                    false,
                    [
                        3
                    ],
                    null,
                    [
                        [
                            3
                        ]
                    ]
                ],
                [
                    "what color is your new dress?",
                    0,
                    true,
                    false,
                    [
                        8
                    ]
                ]
            ],
            [
                [
                    0,
                    43
                ]
            ],
            "Quelle est la couleur de ta nouvelle robe ?",
            0,
            0
        ]
    ],
    1,
    [],
    [
        [
            "fr"
        ],
        null,
        [
            1
        ],
        [
            "fr"
        ]
    ]
]

(Note that la couleur was rendered as color, instead of colour)

When BrE is the translation source, en and en-gb behave exactly the same, as well. That is, Google Translate does not support British English, but its TTS engine can pronounce English (British or American) words with RP accent.

Shatur commented 1 year ago

Thank you for the investigation! Then I would make this language as an exception for Google (as we do for other languages) with en-gb code. This will make it more clear that this language have effect only for Google engine. This way users will have an error message with other engines in UI apps, such as Crow Translate.

Crissium commented 1 year ago

The official Google Translate Android app allows the user to select language and respective audio output accent separately. Therefore, in my opinion we can do the same in Crow Translate. I am clueless about the Arabic language, but in Google Translate app you can select among a host of regional accents (from UAE to Morocco). The same goes for Spanish, among others (of course, English included!). Maybe we should do some research and figure out how many accents of a particular language Google Translate TTS supports. This may call for contributions from native speakers.

Shatur commented 1 year ago

The official Google Translate Android app allows the user to select language and respective audio output accent separately. Therefore, in my opinion we can do the same in Crow Translate

Yes, it would be really great. I believe we even had a request about it. But until we rethink our approach I would ask to create a new exception map for Google as we do for other engines and put en-gb to it. Just as a short term solution and for consistency. Could you do it, please? And I will merge it right away and draft a new minor release of Crow Translate.

Crissium commented 1 year ago

Just did a small test. Turns out that Google Translate TTS supports both québécois and français!

~/Downloads> curl "https://translate.googleapis.com/translate_tts?ie=UTF-8&client=gtx&tl=fr-fr&q=lundi" -o france.mp3
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3456    0  3456    0     0  15900      0 --:--:-- --:--:-- --:--:-- 15926
~/Downloads> curl "https://translate.googleapis.com/translate_tts?ie=UTF-8&client=gtx&tl=fr-ca&q=lundi" -o quebec.mp3
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3552    0  3552    0     0  14101      0 --:--:-- --:--:-- --:--:-- 14151
~/Downloads> cmp france.mp3 quebec.mp3 
france.mp3 quebec.mp3 differ: byte 6, line 1

But Arabic is not so lucky:

~/Downloads> curl "https://translate.googleapis.com/translate_tts?ie=UTF-8&client=gtx&tl=ar-ma&q=%D9%85%D8%B1%D8%AD%D8%A8%D9%8B%D8%A7" -o morocco.mp3
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4704    0  4704    0     0  21341      0 --:--:-- --:--:-- --:--:-- 21285
~/Downloads> curl "https://translate.googleapis.com/translate_tts?ie=UTF-8&client=gtx&tl=ar-ae&q=%D9%85%D8%B1%D8%AD%D8%A8%D9%8B%D8%A7" -o uae.mp3
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4704    0  4704    0     0  20732      0 --:--:-- --:--:-- --:--:-- 20814
~/Downloads> cmp uae.mp3 morocco.mp3

And in the Android app, Google is very clear about this problem: Settings > Voice > Region > Arabic: only 'Default region' gets both speech input and output; others only support speech input French: both Canadian and French French get speech input and output!

Just as a short term solution and for consistency. Could you do it, please? And I will merge it right away and draft a new minor release of Crow Translate.

Unfortunately I've got four upcoming exams on 29 & 30 Aug (I'm a first-year university student). Today I did this PR just to get a break from endless calculus and university physics. But I think I could do it after the crazy exams (will get a two-week holiday). As good luck would have it, I could get it done before September.

Shatur commented 1 year ago

Just did a small test. Turns out that Google Translate TTS supports both québécois and français!

That's interesting! Yes, we definitely should allow to set this separately.

Unfortunately I've got four upcoming exams on 29 & 30 Aug

Okay, not a problem, I will update your PR later. Thank you for the contribution!

Shatur commented 1 year ago

But until we rethink our approach I would ask to create a new exception map for Google as we do for other engines and put en-gb to it.

Played with this myself and the solution looks hacky too... I think we just need a better separation between TTS and translation. I have to hold this for a while until we figure it out how we can rework this.

Shatur commented 1 year ago

Maybe provide an option only for TTS to select a different pronounce? And in Crow Translate we will provide settings in TTS tab for languages that supports it?

Crissium commented 1 year ago

I would suggest we add a 'voice region' option for Google in the 'Speech synthesis' settings tab, as shown below. Options

And for the sake of maintainability, all languages supported by Google should be added to the list, even if only one 'default region' is supported for now; hopefully Google would support more regional accents.

As for the separation of TTS and translation, I think we can treat regional dialects as separate languages in QOnlineTranslator's Language enum, as I have done with British English, but still map them to their region-neutral counterparts (the origional en, fr, etc.) in s_genericLanguageCodes, and in QOnlineTts::languageApiCode write out all supported regional voices:

QString QOnlineTts::languageApiCode(QOnlineTranslator::Engine engine, QOnlineTranslator::Language lang)
{
    switch (engine) {
    case QOnlineTranslator::Google:
    case QOnlineTranslator::Lingva: // Lingva is a frontend to Google Translate
        if (lang == QOnlineTranslator::BritishEnglish)
            return QStringLiteral("en-GB"); // Google Translate won't translate into British English, but its TTS engine supports British pronunciation
        else if (lang == QOnlineTranslator::AmericanEnglish)
            return QStringLiteral("en-US");
        else if (lang != QOnlineTranslator::Auto)
            return QOnlineTranslator::languageApiCode(engine, lang); // Google use the same codes for tts (except 'auto')
        break;
// ...

Translation won't get broken because of unrecognised language codes this way. But adding a lot of 'new' languages (e.g. English (British pronunciation), French (Quebecois pronunciation)) to the language selection dialog is confusing and messy.

Or in QOnlineTts a separate RegionalVoice enum can be kept, and when Google is the TTS provider, a user should be able to specify a regional voice in the args of QOnlineTts::generateUrls.

Shatur commented 1 year ago

I would suggest we add a 'voice region' option for Google in the 'Speech synthesis' settings tab, as shown below.

Yes, this is exactly what I suggesting!

Or in QOnlineTts a separate RegionalVoice enum can be kept, and when Google is the TTS provider, a user should be able to specify a regional voice in the args of QOnlineTts::generateUrls.

I would go this way. Because of this:

But adding a lot of 'new' languages (e.g. English (British pronunciation), French (Quebecois pronunciation)) to the language selection dialog is confusing and messy.

But instead of a separate argument, I would pass a map with languages and selected regions in another function to store this settings inside QOnlineTts. Will be easier to map it in Crow TTS settings later. I currently busy, so PR would be very welcome.