SEPIA-Framework / sepia-assist-server

Core server of the SEPIA Framework responsible for NLU, conversation, smart-service integration, user-accounts and more.
https://sepia-framework.github.io/
MIT License
94 stars 15 forks source link

Missing `char=neutral` in English language file. #36

Closed barrynl closed 1 year ago

barrynl commented 1 year ago

Hi,

I want to create a Dutch language file for the SEPIA assist server and to help me translate the English sentences quickly, I am working on a simple Java tool that helps me extract the sentences of an existing language file (answers_en.txt) and print them into chunks of maximum 5000 characters each (the character limit of Google Translate). After translating these English sentences into Dutch, I want to merge them back into the original file.

When 'parsing' the language file I noticed a single line that misses the ;;char=neutral part at the end. Is this an error, or should my tool support that part missing?

For now I assumed it is an error.

Regards, Barry

fquirin commented 1 year ago

Hi Barry,

I want to create a Dutch language file for the SEPIA assist server and to help me translate the English sentences quickly, I am working on a simple Java tool that ...

Nice 😎. Just a short comment: Translating SEPIA into a new language is unfortunately a very complex task, due to the dynamic NLU module that consists of numerous, hand-written rules to parse English and German, but you should be able to make static sentences (teach_xy files) and the ones created via Teach-UI work at least. I've started to work on a new NLU module that will improve the situation (and introduce hopefully compatibility to Rhasspy's sentence.ini if you know that one ^^), but got a little bit distracted lately by all the new AI stuff popping up =), so its still in a very early phase.

When 'parsing' the language file I noticed a single line that misses the ;;char=neutral part at the end. Is this an error, or should my tool support that part missing?

The answer line parser ignores the missing 'char' info and will set it to default neutral if I remember correct.

Regards, Florian

barrynl commented 1 year ago

Hi, thanks for the info.

Not a problem if my translations are not that useful, I was just looking for something SEPIA related to get started with.

I did see the dynamic NLU modules and as you mention they look quite complex and hard-coded :). I do not know the Rhasspy library, but that should indeed make the language system more flexible. I am also interested in the topic of translating natural language sentences to instructions for the assistant, but then make these instructions being operations (adding/removing facts) to/from a virtual knowledge graph. But first I want to know the SEPIA framework a bit better.

You mention the teachIt_xy files and I've also translated the English version to Dutch. Where can I test whether it works?

fquirin commented 1 year ago

I am also interested in the topic of translating natural language sentences to instructions for the assistant, but then make these instructions being operations (adding/removing facts) to/from a virtual knowledge graph

Sounds interesting :-). There are a lot of ways to realize that I think, either as one or more smart-services that explicitly handle the sentences ("add XY to my contacts list" ... etc.) or one could modify the NLU pipeline with a module that can identify these requests in a more general way ("Amsterdam is a city", "The Eiffeltower is xy m heigh" etc.). The later is more complicated but with modern systems (LLMs or classification models etc.) it should be doable.

You mention the teachIt_xy files and I've also translated the English version to Dutch. Where can I test whether it works?

If you start the client and set the language to NL (experimental), you should be able to ask these sentences and get some kind of reaction :-). Let me know what happens ^^.

barrynl commented 1 year ago

Hi, if I try it with the SEPIA-Home bundle (i.e. replacing the teachIt_nl.txt file in the sepia-assist-server/Xtenstions/Assistant/commands/ folder) there is no response in chat. I do the following steps:

Some questions:

Thanks for your responsiveness.

Regards, Barry

fquirin commented 1 year ago

So first things first ^^: Does it work with English language? :-)

log in as assistant@sepia.localhost.

Oh, this could actually be the problem because that is the account of the assistant itself. Better first add a "real" user and log in with that one.

Some answers:

where is the translation from the audio of my voice to the text happening?

It depends on your settings of the client, but the default is "native" which means it uses the APIs given by the platform you're running on. This can be your Browser (Edge -> Windows API, Chrome -> Google API, FF -> not available, Samsung Browser -> Samsung API, Safari -> Apple API) or the Android App (depends on Android System settings). To switch to the SEPIA STT Server first you have to start it, then select "sepia" as ASR engine in the settings and make sure its pointing to the right URL. NOTE: To use Dutch as language with the SEPIA STT Server you have to switch to the "dev" version at the moment and use the new Whisper integration (pre-release) or you have to find a Dutch ASR model for Vosk/Kaldi/Coqui.

I run the sepia-assist-server from my Eclipse I do not get any chat working (also English). What do I need to start as well

You need to start the websocket chat-server and teach-server as well.

there are existing libraries that you could have used (for example slf4j instead of the net.b07z.sepia.server.core.tools.Debugger class, or Apache Commons CLI. Was this a deliberate choice or did the project just develop into this?

In this specific case it just happend somehow because when I started with the project (many moons ago) I was in a phase where I really hated all these loggers, they never did what I wanted 😅🙈. I've added slf4j-simple support at some point, but never felt it was really necessary to make a full transition.

barrynl commented 1 year ago

Hi, yes, it works with the English language. Both a more dynamic sentence like How long does it take to travel from Amersfoort to Utrecht?, but also these teachIt sentences like This is a test. work. When I switch to Dutch both these sentences (only translated) give back a S.O.S. a lost answer!, although when I manually create a teachIt for the Dutch sentence dit is een test to open a website for example, it still gives the same SOS response, but it does open de website (or at least provides the link).

Thanks for the answers, they were mostly out of curiosity. Good to know that it currently probably still sends my voice audio to some cloud service from Microsoft or Google and that I can prevent that by running my own sepia-tts-server 👍 .

Best regards,

Barry

fquirin commented 1 year ago

Hi Barry,

I just double-checked the support for experimental languages via the Italian translation and its working ('Questo e' un test' -> 'Sta andando!'). If you have some Dutch test-files you want to upload I could have a look and see if I can find the issue.

both these sentences (only translated) give back a S.O.S. a lost answer!

This error happens when the answer-key wasn't found in the answers file. For example if you have this sentence in any of the command files: Test;; command=chat;; reply=<test_0a>

Then it looks for the key test_0a in the answer files, e.g.: test_0a;; rep=0|mood=5;; Funziona! ;;char=neutral

barrynl commented 1 year ago

Hi,

If you have some Dutch test-files you want to upload I could have a look and see if I can find the issue.

these are the two translated files with which I try to get the Dutch chat working:

I wonder if it doesn't work, because I did not translate the chats_nl.txt yet.

I'll try to get the English version working while running my own sepia-stt-server 👍

fquirin commented 1 year ago

Hi @barrynl , sorry I didn't find time to test this yet, but will do it hopefully very soon. The chats_nl.txt is basically a special file dedicated to small-talk commands, but works identical to teachit_nl. All the answers are taken from answers_nl or your custom commands and reply values. So leaving chat_nl out for now should not influence the test command 🤔