Stypox / dicio-android

Dicio assistant app for Android
GNU General Public License v3.0
804 stars 72 forks source link

Launching Apps #22

Open moriel5 opened 2 years ago

moriel5 commented 2 years ago

To test this feature (and joke around a bit), I tried telling Dicio to open itself (by saying "Open Dicio").

It took three tries, which I would probably attribute to my accent, however this issue is about the hilarious results of the first two tries (the last try brought up a toast about opening Dicio, though nothing else happened, good job on that!).

First try: Dicio interpreted my command as "Open CEO", and for some odd reason launched ProtonMail.

Second try: Dicio interpreted my command as "Open D C O if", and oddly enough launched Aegis.

I can understand my accent not going through, but not operations that are completely unrelated to the interpreted results.

Shura0 commented 2 years ago

In Russian it impossible to run dicio at all because there is no possibility to pronounce name of program in English. It tries to recognize words only in Russian.

moriel5 commented 2 years ago

Hmm... I only tried English, as my native language (Hebrew) isn't even supported.

However, as I come from a Russian-speaking family, I find it odd that you cannot pronounce "Dicio" in Russian, shouldn't it be "Дизио " (not a Russian speaker, just following an online guide and what I know from my parents, aunt and grandmother)?

Edit: I had just asked my father, and it seems it should actually be "Дизсо".

nshmyrev commented 2 years ago

You can send me the list and I'll add required words to vocabulary of the model

Shura0 commented 2 years ago

@moriel5 I can pronounce "Дисио", but the program does not recognize it since there is no such word in vosk vocabulary.

@nshmyrev list of what?

moriel5 commented 2 years ago

@Shura0 Ah sorry, I was under the impression that it takes the names of the apps from the system's list of installed apps.

nshmyrev commented 2 years ago

@nshmyrev list of what?

List of words you want to recognize

Shura0 commented 2 years ago

'dicio'

But in ideal world I would like to have vocabulary that contain Russian and English words in the same time

Shura0 commented 2 years ago

Let me explain my message in details. I should excuse for text below, I am not a programmer nor voice recognition expert, but just a user of mobile phone with some technical background.

Current issue cannot be solved by adding program names to Russian or any other vocabulary, there should be an other way. It should be a context vocabulary that builds when user run Dicio. And Dicio should work not with vocabularies in whole, but with context vocabularies. First of all it should be main context vocabulary, that contain only words from *.dslf files. Then, depend on skill, the program should choose other vocabularies. It will decrease recognition error rate. Let's get for example 'open' skill User says 'open dicio' Word 'open' is in main vocabulary, so as it recognized the program passes the phrase to 'open' context vocabulary that built on installed program list, so 'dicio' word (sound) passes to the open context vocabulary. Dicio is installed, so the vocabulary have this word (independent on language)

More obvious example with calculator User says 'calculate four plus two' 'Calculate' is in main vocabulary, then other phrase (four plus two) passes to calculator context vocabulary that contain only digits and math signs. No words like 'to' or 'for' (case #21). So, the recognition will be exactly 4+2.

Next example with navigation. We can build a special context vocabulary that contains all geographic names from map, so it will be possible to have good (i hope) geo voice search

moriel5 commented 2 years ago

@Shura0 That would be a great idea, although it could also become complicated later on if there would be need to reference multiple context vocabularies simultaneously (to compensate for when people have local names that are shortened or not necessarily the official names in their countries (like "Rishon" being a shortened nickname for the city "Rishon LeTzion" in Israel, or numerous cities in the US having downtown areas that are simply nicknamed "Downtown"), or even if someone does not fully remember the name of a term, so they could say something like "what is the symbol for not equals?" and get the result directly, without an online search.

Stypox commented 2 years ago

Currently Dicio interprets commands such as "Open NewPipe" thanks to capturing groups. In that case "NewPipe" would go into the what capturing group (see this file: .what. is the capturing group). The Open skill then picks up the content of the capturing group, searches for the app with the most similar name using Levenshtein distance (code is here) and chooses the app with the smallest distance (but if there is no app with distance smaller than or equal to 5 nothing is opened).

There are two ways this can be improved:

The solution to the first point is pretty straightforward and just requires changing a couple of lines of code across a couple of files. A better algorithm I propose could be based on a modified Levenshtein distance: given A = the first string with length |A|, B = the second string with length |B| (let's assume |B| > |A|), lev_dist = levenshtein_distance(A, B) as calculated before, start_idx/end_idx = the index of the first/last character in B that matches a character in A in the calculated Levenshtein distance matrix, then modified_levenshtein_distance(A, B) = lev_dist - |B| + |A| + min(1, start_idx) + min(1, |B| - end_idx - 1). So for example modified_levenshtein_distance("Notes", "Omni Notes FOSS") = 2 and modified_levenshtein_distance("Notes", "Notify") = 3.

The solution to the second point is more difficult to implement and would require some structural changes. I will try to sort that out when I have time, since it would be a great improvement not just for this skill, but for Dicio in general.

Thank you everyone for all the feedback :-)

Stypox commented 2 years ago

I tried to improve the algorithm that computes the distance between two strings. Could you tell me if this APK improves your experience a little? Now if I say e.g. "Open Notes", "Omni Notes FOSS" is opened even if it has many characters more. And if I say "Open Maps", "Google Maps" would be opened instead of "Move". app-debug.zip

moriel5 commented 2 years ago

@Stypox I just tested the debug APK, and I can say, that while the recognized words remain the same (like "the CEO"instead of "Dicio" or "the explorer" instead of "MiXplorer"), the actions taken are now sensible and actually are close to the words recognized, so overall, I have had a more consistent experience.

Stypox commented 2 years ago

Ok, thank you, I merged the code that improves the string distance method, see #41

igoralmeida commented 2 years ago

Hi, thanks for the app!

What would be required for triggering other* intents in external apps? I'm thinking hands-free usage, for example "play" to resume playing whatever media is currently on.

* assuming you trigger MainActivity when opening an app

Stypox commented 2 years ago

@igoralmeida we can already create a skill that sends intents to external apps without issues. For hands-free usage #48 still needs to be solved instead.

Shura0 commented 1 year ago

Still does not work for Russian language model. It recognize program names in Russian and then cannot find it in application list (because it in English). By shortest distance it ALWAYS run DMD2 application for any request.

Maybe you should transliterate request to latin letters before app search?