SEPIA-Framework / sepia-docs

Documentation and Wiki for SEPIA. Please post your questions and bug-reports here in the issues section! Thank you :-)
https://sepia-framework.github.io/
236 stars 16 forks source link

Sepia in Spanish.? #128

Open sylarsystems opened 2 years ago

sylarsystems commented 2 years ago

I would like to configure SEPIA in order to work in Spanish(STT, TTS)

It is possible at this point of development of SEPIA.?

Which changes in config files would be necessary.?

I will appreciate your help.

Thanks

fquirin commented 2 years ago

Hi,

I've added experimental languages to the selector inside the new SEPIA client (v0.24.0 from SEPIA-Home v2.6.0) to actually find out how far we can come when trying to implement them :slightly_smiling_face: .

STT and TTS should work right away if you switch to ES (settings page 2) and use "native" engines. If you use "sepia" engines there are options to get TTS running (e.g. Mary-TTS server or Larynx) and there is a Spanish model available for the STT server as well (Vosk models).

The complicated part is the NLU (natural-language-understanding) because parameter extraction (aka NER) is heavily rules based and partially hard-coded in Java for complex services. I'm planning to make this more flexible but lately I've been too busy with other features :-|. There are some things though that will work right away by just translating some texts :smiley: . Check out the files inside SEPIA\sepia-assist-server\Xtensions\Assistant. There are files for Spanish (answers_es.txt, chats_es.txt, teachit_es.txt) but they are empty. Someone has to copy and translate the German or English files to fill the Spanish SEPIA with life :grin: .

Most parts of the Teach-UI (custom commands) should work out-of-the-box as well, especially with a translated 'answers_es.txt' :-).

I'd definitely be interested to get some feedback if you're playing around with new languages :-)

sylarsystems commented 2 years ago

Hello Florian, thanks for your rapid answer and the info!

I already did the SEPIA Install in an Ubuntu Server 20.x Virtual Machine and it is running perfectly. Nice Work!

I am playing right now with the language changing to ES(Spanish Mexico) and populating the file answers_es.txt with a translated version of the English version in order to start testing

By the time, the sepia system seems to be starting without problems, however, when I interact with the webApp I get this response: “S.O.S. a lost answer!”

I will be doing a double check, just in case I had misspelling something in somewhere in he modified files.

Andy Martinez

[EDIT: reformatted for better readability]

fquirin commented 2 years ago

I already did the SEPIA Install in an Ubuntu Server 20.x Virtual Machine and it is running perfectly. Nice Work!

Awesome :-)

S.O.S. a lost answer!

This will happen anytime the server does not find a specific tag inside the answers_es.txt. Can you try adding the remaining English lines until they are fully translated to see if you get an "Spenglish" answer instead ^^?

sylarsystems commented 2 years ago

Ok, I already translated all the lines on the answers_es.txt file, but still getting the “S.O.S. a lost answer!” response. Even with the original English content copied entirely from answers_en.txt to answers_es.txt

I do test a specific phrase in Spanish(“no hay problema”) with the Assistant API Testing Tool, and I get the following:

{
    "result": "success",
    "hasAction": false,
    "cardInfo": [],
    "answer": "S.O.S. a lost answer!",
    "more": {
        "certainty_lvl": 1,
        "cmd_summary": "chat;;reply=Ok||Bien||Fino;;",
        "context": "chat;;default",
        "language": "es",
        "user": "uid1003"
    },
    "hasCard": false,
    "actionInfo": [],
    "answer_clean": "S.O.S. a lost answer!",
    "htmlInfo": "",
    "hasInfo": false,
    "resultInfo": {
        "cmd": "chat"
    },
    "processTime": {
        "nlu": 23,
        "service": 1
    }
}

It seems to find the line with the related info, as you can see in the first part of the response, however it send the "S.O.S. a lost answer!"

In the attachment you can find my translated answers_es.txt file

[EDIT: reformatted for better readability | removed attachment (see answer below for "fixed" attachment)]

fquirin commented 2 years ago

Hi,

first of all thanks a lot for the translated file! :star_struck:

I've checked the file and fixed some issues. Some tags (the initial part of each line) were (accidentally?) translated. These have to be the same for every language, because it's the "ID" of the answer. I've also removed one of the 'char' properties 'genial' because SEPIA server only knows 'neutral', 'rude', 'polite', 'cool' atm and will throw an error for unknown ones.

I've imported the resulting txt file to my test-system and it works :grinning: : answers_es.txt

I notices that you were probably using the answers file of SEPIA v2.5.1 because some lines I'Ve added in v2.6.0 were missing:

error_media_player_0a;;     rep=0|mood=5;;      Sorry, but there was a media playback error. I'm not sure why.      ;;char=neutral
error_media_player_0b;;     rep=0|mood=5;;      Sorry, but I couldn't find anything to play.        ;;char=neutral
alarms_0c;; rep=0|mood=5;;      I can't find anything right now, please try again in a few moments. ;;char=neutral

I copied the English lines over into the new answers_es.txt, maybe you could translate them as well? :slightly_smiling_face:

Looking at your test result it seems the input was ok :thinking: . Did you restart the server after you made your changes? This is required because the server loads the files on start-up into memory to get fast access. Another thing you can check is the sepia-assist-server/log.out file. At the very top you should see a message about imported answers and if something went wrong there should be an error.

sylarsystems commented 2 years ago

Hello

Ok, understood. Indeed, it is possible that these missing tags were translate accidentally.(I used a translator and then verify the file for errors or , mistranslated words)

I replace the file with the one you sent, restart and that did the magic! Thanks, I will keep the tags integrity in mind in the future :)

About the version of the file, indeed I start translating it since the v2.5.1(I had an v2.5.1 previously installed and testing in English, I did a fresh install on another server for v2.6.0 I order to test and use in Spanish), I will translate the new changes in the v2.6.0 and now that it is functional again, I will proceed to translate the content in the other files you previously indicates me, and verify some semantic in the sentences.

I will send to you the files when it is done!

Is there a way in config in order to set to Spanish(ES_MX) language in a permanent way.? It would be useful.

Note: I am from Venezuela(our international code would be ES_VE), but when we talk about language classification, the most neutral and close match we found(Movies, Software, others) is Spanish Mexico

Thanks Florian!

fquirin commented 2 years ago

I replace the file with the one you sent, restart and that did the magic!

Great :-)

I will send to you the files when it is done!

:smiley: :+1:

I found another issue. It seems some tags were translated accidentally as well:

<nombre_usuario> and <nombre_de_usuario> -> <user_name> (name of the user)
<nombre> -> <name> (name of assistant)

Here is a new version: answers_es.txt

Is there a way in config in order to set to Spanish(ES_MX) language in a permanent way.?

Currently "es" will unfortunately default to "es-ES" on every restart. I just double-checked where this is defined and found that the list is still pretty "raw". You can find it in the client at [client-folder]/scripts/sepiaFW.local.js at the bottom:

var defaultShortLangCodeToLongMap = {
  "de": "de-DE",
  "en": "en-US",
  "es": "es-ES",
  "fr": "fr-FR",
  "nl": "nl-BE",
  "pt": "pt-PT"
}

If you are using the DIY client you could edit this (~/clexi/www/sepia/) until I can improve the options. I'm not sure if Venezuela Spanish will work with all native engines (Google, Apple, Microsoft) but you can give it a try ;-)

sylarsystems commented 2 years ago

I found another issue. It seems some tags were translated accidentally as well:

Indeed, I found them too, also corrected :-)

Currently "es" will unfortunately default to "es-ES" on every restart. I just double-checked where this is defined and found that the list is still pretty "raw". You can find it in the client at [client-folder]/scripts/sepiaFW.local.js at the bottom:

Ok, ok. I´ll see if I can do the necessary changes. Venezuela Spanish(ES_VE) is rarely included in Programming Libraries or Frameworks, so ES_MX is considered the standard for Neutral Spanish in America(I mean, in general, most of people in Spanish speaking countries understand it), so I think that, in terms of compatibility, the Spanish language options should be limited to ES_ES(for Sapnish in Europe) and ES_MX(for Spanish in America), I would suggest…

Once the new file placed and restarted the systems, I was trying commands like “What time is it?”(“Que hora es.?”) or “What day is today”(“Que dia es hoy”) In Spanish but it doesn’t work, I suppose it need to be translated or even rewrite. If you tell me where I can find it, I will try to adapt it. Possibly the same case for all the other English commands.

Thanks

fquirin commented 2 years ago

so I think that, in terms of compatibility, the Spanish language options should be limited to ES_ES(for Sapnish in Europe) and ES_MX(for Spanish in America), I would suggest…

I've just updated the 'dev' branch of the client to support user-selectable region settings (via URL parameter, settings.js and UI) that will currently be stored per client (not inside the account yet). By default the client will also try to read the region from system settings IF it matches the general language (that means if your account is set to 'es' the region should default to 'es-VE' and could manually be set to 'es-MX' if you like). If nothing unexpected happens these changes will be included in the next update :slightly_smiling_face:

I was trying commands like “What time is it?”(“Que hora es.?”) or “What day is today”(“Que dia es hoy”) In Spanish but it doesn’t work, I suppose it need to be translated or even rewrite

In this case we can actually simply translate them. These sentences are located in the teachit_en.txt. The file is very similar to the answers file but connects input to output for static sentences :-). More complex commands will require a change of the Java code unfortunately and you will see some strange behavior most likely from time to time. The log at sepia-assist-server/log.out should keep track of most errors that happen due to missing parameter extraction rules. If we collect those I can try to extend them step-by-step :slightly_smiling_face:

[EDIT]There is another file with a lot of small-talk commands: chats_en.txt. It works the same way as techit_en.txt.