Stypox / dicio-sentences-compiler

Sentences-compiler for Dicio assistant
GNU General Public License v3.0
11 stars 4 forks source link
assistant assistive-technology compiler dicio dicio-assistant dicio-sentences-language personal-assistant personal-assistant-framework voice-assistant

Sentences compiler for Dicio assistant

This tool provides a simple way to generate sentences to be matched for the Dicio assistant. It compiles files formatted with the Dicio-sentences-language to Java code that can be easily imported in projects using the interpreter of the Dicio assistant. It allows to pack together similar sentences while preserving readability.

This repository is part of the Dicio project. Also check out dicio-android, dicio-skill and dicio-numbers. Open to contributions :-D

Dicio sentences language

Every file contains many sections, starting with section information and followed by a list of sentences. The section information is formatted like SECTION_ID:SPECIFICITY, where SPECIFICITY can be low, medium and high, representing how specific the set of sentences is. For example, a section that matches queries about phone calls is very specific, while one that matches every question about famous people has a lower specificity. The specificity is needed to prevent conflicts between two sections that both match with a high score: the most specific is preferred. Then sentences follow: every sentence is made of an optional sentence id (formatted like [SENTENCE_ID] and used for sentence identification purposes) and a list of constructs followed by a ;. Constructs can be:

Punctuation marks and special characters

Note that punctation marks should not be inserted. Words are only made of letters, and other special characters are part of the the language's grammar, so characters []"|?().<> will be interpreted with their special meaning explained above and other punctation marks will generate errors. But this does not mean that Dicio is not able to handle sentences with punctation marks! Before being processed, the input from the user is split into lowercase letter-only words, so "It's" becomes "it" and "s" (the relevant code is at dicio-skill). Therefore, when writing a dicio-sentences-language sentence which could contain e.g. apostrophes, just replace them with a space to obtain the same result. The case of letters (and their diacritics, for diacritics-insensitive words) is ignored, too.

Sentence example and explanation

weather: high
(what s|is)|whats "the" weather like? (<i|o>n .where.)?;

The example above declares a section named "weather" with a high specificity (high, since... what else could weather mean, if not atmospheric conditions?). Then a sentence follows:

So all of the following inputs from the user would match the above sentence perfectly:

Compilation process

When issuing a compilation, dicio-sentences-compiler will first parse the provided file and build a syntax tree. Then every sentence is analyzed and converted into a format which allows running a O(number of words) depth-first search on it with as little runtime overhead as possible. Every word in the sentence is assigned a unique index, a list of indices of all words that could come next, and the minimum number of words to skip to get to the end of the sentence. The index is used (you guessed it!) just for indexing. The list of next word indices is needed to instantly determine the possible next words during a depth-first search. The number of words to get to the end allows lowering the score accordingly while doing the search, without having to recalculate it at runtime. When a section is put together, besides the list of compiled and analyzed sentences, it has the specificity value and (if applicable) the list of all capturing group names, to allow compiling them to language variables, for convenience's sake and to prevent typos, much like with Android's R class.

Java

The compilation to Java relies on the dicio-skill library, so sections will be compiled in this format:

StandardRecognizerData SECTION_NAME = new StandardRecognizerData(
        InputRecognizer.Specificity.SPECIFICITY,
        new Sentence(SENTENCE_ID, LIST_OF_STARTING_WORD_INDICES,
                new DiacriticsSensitiveWord(VALUE, MINIMUM_SKIPPED_WORDS_TO_END, NEXT_WORD_INDICES...),
                new DiacriticsInsensitiveWord(NORMALIZED_VALUE, MINIMUM_SKIPPED_WORDS_TO_END, NEXT_WORD_INDICES...),
                new DiacriticsSensitiveRegexWord(REGEX, MINIMUM_SKIPPED_WORDS_TO_END, NEXT_WORD_INDICES...),
                new DiacriticsInsensitiveRegexWord(REGEX, MINIMUM_SKIPPED_WORDS_TO_END, NEXT_WORD_INDICES...),
                new CapturingGroup(NAME, MINIMUM_SKIPPED_WORDS_TO_END, NEXT_WORD_INDICES...),
                new ...(...), ...),
        new Sentence(...), ...);

If a section collected the capturing group names, they will be compiled to variables accessible as a field of the section, by extending StandardRecognizerData, that is:

class SectionClass_SECTION_NAME extends StandardRecognizerData {
        SectionClass_SECTION_NAME() { super(... INITIALIZED AS ABOVE ...); }
        public String CAPTURING_GROUP_1 = "CAPTURING_GROUP_1", CAPTURING_GROUP_2 = "CAPTURING_GROUP_2", ...;
}
SectionClass_SECTION_NAME SECTION_NAME = new SectionClass_SECTION_NAME();

If a section map name is provided via the --create-section-map parameter, a Map<String, StandardRecognizerData> will be created containing a mapping between section ids and their corresponding StandardRecognizerData instance. This can be useful for autogeneration code (like that found in dicio-android's build.gradle) in combination with the --sections-file parameter.

Build and run

To build the project open it in Android Studio (IntelliJ Idea probably works, too) and create an Application configuration in the "Run/Debug Configurations" menu, set "Main class" to org.dicio.sentences_compiler.main.SentencesCompiler, "Use classpath of module" to sentences_compiler and "Program arguments" to the arguments for the compiler. Then run the newly created configuration with the "Run" button. Set --help as "Program arguments" to get an help screen explaining the options.

This project can be also used as a library. In that case, add 'com.github.Stypox:dicio-sentences-compiler:VERSION' to your Gradle dependencies, replacing VERSION with the latest release or commit. Then use the org.dicio.sentences_compiler.main.SentencesCompiler#compile() function to compile using input files and output streams (take a look at the javadoc documentation provided there).

Example

The file below is example.dslf. "dslf" means "Dicio-Sentences-Language File".

mood: high       # comments are supported :-D
how (are you doing?)|(is it go<ing|ne>);
[has_place] how is it going over <t?>here;
[french] comment "êtes" voùs;  # quotes make sure êtes is matched diacritics-sensitively,
                               # while voùs will be matched the same way as vous

GPS_navigation: medium
[question]  take|bring me to .place. (by .vehicle.)? please?;
[question]  give me directions to .place. please?;
[question]  how do|can i get to .place.;
[statement] i want to go to .place. (by .vehicle.)?;
[statement] .place. is the place i want to go to;

The above Dicio-sentences-language file is compiled to Java code by running the sentences-compiler as explained above, and setting the line below as "Program arguments".

--input "example.dslf" --output "ClassName.java" --sections-file "stdout" java --variable-prefix "section_" --package "com.pkg.name" --class "ClassName" --create-section-map "sections"

After clicking on the "Run" button, mood GPS_navigation should be outputted and the Java code shown below should be inside a file called ClassName.java in the root directory of the repository. Indentation and spacing were added manually in order to improve readability.

/*
 * FILE AUTO-GENERATED BY dicio-sentences-compiler. DO NOT MODIFY.
 */

package com.pkg.name;

import java.util.Map;
import java.util.HashMap;
import org.dicio.skill.chain.InputRecognizer.Specificity;
import org.dicio.skill.standard.Sentence;
import org.dicio.skill.standard.StandardRecognizerData;
import org.dicio.skill.standard.word.DiacriticsInsensitiveWord;
import org.dicio.skill.standard.word.DiacriticsSensitiveWord;
import org.dicio.skill.standard.word.CapturingGroup;

public class ClassName {
    public static final StandardRecognizerData section_mood = new StandardRecognizerData(Specificity.high,
        new Sentence("", new int[]{0},
            new DiacriticsInsensitiveWord("how", 4, 1, 4),
            new DiacriticsInsensitiveWord("are", 3, 2),
            new DiacriticsInsensitiveWord("you", 2, 3, 7),
            new DiacriticsInsensitiveWord("doing", 1, 7),
            new DiacriticsInsensitiveWord("is", 3, 5),
            new DiacriticsInsensitiveWord("it", 2, 6),
            new DiacriticsInsensitiveRegexWord("go(?:ing|ne)", 1, 7)),
        new Sentence("has_place", new int[]{0},
            new DiacriticsInsensitiveWord("how", 6, 1),
            new DiacriticsInsensitiveWord("is", 5, 2),
            new DiacriticsInsensitiveWord("it", 4, 3),
            new DiacriticsInsensitiveWord("going", 3, 4),
            new DiacriticsInsensitiveWord("over", 2, 5),
            new DiacriticsInsensitiveRegexWord("(?:t|)here", 1, 6)),
        new Sentence("french", new int[]{0},
            new DiacriticsInsensitiveWord("comment", 3, 1),
            new DiacriticsSensitiveWord("êtes", 2, 2),
            new DiacriticsInsensitiveWord("vous", 1, 3)));

    public static final class SectionClass_section_GPS_navigation extends StandardRecognizerData{
        SectionClass_section_GPS_navigation(){
            super(Specificity.medium,
                new Sentence("question", new int[]{0, 1},
                    new DiacriticsInsensitiveWord("take", 9, 2),
                    new DiacriticsInsensitiveWord("bring", 11, 2),
                    new DiacriticsInsensitiveWord("me", 10, 3),
                    new DiacriticsInsensitiveWord("to", 9, 4),
                    new CapturingGroup("place", 8, 5, 7, 8),
                    new DiacriticsInsensitiveWord("by", 6, 6),
                    new CapturingGroup("vehicle", 5, 7, 8),
                    new DiacriticsInsensitiveWord("please", 4, 8)),
                new Sentence("question", new int[]{0},
                    new DiacriticsInsensitiveWord("give", 7, 1),
                    new DiacriticsInsensitiveWord("me", 6, 2),
                    new DiacriticsInsensitiveWord("directions", 5, 3),
                    new DiacriticsInsensitiveWord("to", 4, 4),
                    new CapturingGroup("place", 3, 5, 6),
                    new DiacriticsInsensitiveWord("please", 1, 6)),
                new Sentence("question", new int[]{0},
                    new DiacriticsInsensitiveWord("how", 9, 1, 2),
                    new DiacriticsInsensitiveWord("do", 6, 3),
                    new DiacriticsInsensitiveWord("can", 8, 3),
                    new DiacriticsInsensitiveWord("i", 7, 4),
                    new DiacriticsInsensitiveWord("get", 6, 5),
                    new DiacriticsInsensitiveWord("to", 5, 6),
                    new CapturingGroup("place", 4, 7)),
                new Sentence("statement", new int[]{0},
                    new DiacriticsInsensitiveWord("i", 10, 1),
                    new DiacriticsInsensitiveWord("want", 9, 2),
                    new DiacriticsInsensitiveWord("to", 8, 3),
                    new DiacriticsInsensitiveWord("go", 7, 4),
                    new DiacriticsInsensitiveWord("to", 6, 5),
                    new CapturingGroup("place", 5, 6, 8),
                    new DiacriticsInsensitiveWord("by", 3, 7),
                    new CapturingGroup("vehicle", 2, 8)),
                new Sentence("statement", new int[]{0},
                    new CapturingGroup("place", 10, 1),
                    new DiacriticsInsensitiveWord("is", 8, 2),
                    new DiacriticsInsensitiveWord("the", 7, 3),
                    new DiacriticsInsensitiveWord("place", 6, 4),
                    new DiacriticsInsensitiveWord("i", 5, 5),
                    new DiacriticsInsensitiveWord("want", 4, 6),
                    new DiacriticsInsensitiveWord("to", 3, 7),
                    new DiacriticsInsensitiveWord("go", 2, 8),
                    new DiacriticsInsensitiveWord("to", 1, 9)));
        }
        public final String place = "place", vehicle = "vehicle";
    }
    public static final SectionClass_section_GPS_navigation section_GPS_navigation = new SectionClass_section_GPS_navigation();

    public static final Map<String, StandardRecognizerData> sections = new HashMap<String, StandardRecognizerData>() {{
        put("mood", section_mood);
        put("GPS_navigation", section_GPS_navigation);
    }};
}