aigents / aigents-java

Aigents Java Core Platform
MIT License
30 stars 12 forks source link

Question-Answering Engine based on Natural Language Generation #33

Open akolonin opened 4 years ago

akolonin commented 4 years ago

Overall task and design: Based on #22, we need to provide an extended version of the Question Answering to replace or texted the current placeholder: https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/peer/Answerer.java The code may go to org.aigents.nlp.qa or to respective package of the Aigents Platform Core. There are few things to be done, written in the following pseudo-code to be refined during the implementation phase:

interface Indexer {
    void clear();//clears the current index
    void index(String text);//indexes text in the internal model where the model can be any
    Linker retrieve(String query);//retrieve the ranked list of relevant words based on the single query applied to the scope of all texts indexed by date, see https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Linker.java 
}

//Candidate implementation of the Indexer relying on the existing code
class GraphIndexer implements Indexer {
    Graph graph;//see https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Graph.java 
    int Mskip = 2;//width of skipping window to build word pairs
    // will be used to index any number of input texts in a graph object
    @Override
    index(String text){
        // tokenise text with Parser.parse https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Miner.java#L580
        // build word-word links based on per-sentence word pairs co-occurring in a distance of Mskip using link types "pred" and "succ" and store them in a graph with link weight set as W = Mskip / distance (so the closer words are given larger weight, the closest word weighted as Mskip and the most distant word weighted as 1) 
    }
    @Override
    Linker retrieve(String query){
        // tokenize query with Parser.parse https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Miner.java#L580
        // compute the ranks of nodes in the graph using algorithm GraphOrder.directed https://github.com/aigents/aigents-java/blob/master/html/ui/aigents-graph.js#L537 (need to add this function to Graph class) initialized with word nodes found in the query, with every word node weight to be 1 denominated with word frequency from https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/LangPack.java#L85.  
        // retrieve the computed ranks of words from Graph and return in Linker implementation such as https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Counter.java or https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/data/Summator.java having it returned  
    }
}

class AnswerGenerator extends Answerer { //to be re-used in https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/peer/Answerer.java 
    Indexer indexer; //see above
    Generator generator; //see above 
    in max words;//configured hard cap limit on number of words to be used to build the reply
    String answer(String query){
        Linker words = indexer.retrieve(query);
        if (words == null || words.size() ==0)
            return "No.";
        Collection<String> top = getTopWordsFromLinker(linker);
        String response = generator.generate(top); //see #22 
        return response;
    }
}

Task outline:

  1. Complete #22
  2. Implement the above
  3. Find the baseline/train/test set for Question Answering from Kaggle or papers online
  4. Fine-tune the design, implementation, and parameters to provide results reasonable according to item 3 above
  5. Integrate with Aigents chat-script functionality 5.1. Extend, replace or override the existing Aigents Answerer https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/peer/Answerer.java using Intenter plugin replacement design https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/agent/Demo.java#L82 5.1.1. Solve the simplest summarization problem so given a single text as an input and few words as a seed, a brief summary out of the larger text body is created like with public static String summarize(java.util.Set words, String text) function in https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/peer/Answerer.java#L163
    5.1.2. Solve the more complex answering problem where multiple texts are given and need to extract the relevant summary answering the question from the combination of the multiple text bodies, like with Collection searchSTMwords(Session session, final SearchContext sc) function in https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/peer/Answerer.java#L82
    5.2. Extend unit test such as https://github.com/aigents/aigents-java/blob/master/php/agent/agent_chat.php 5.3. Test in Telegram chat-bot 5.4. Consider if some code should be moved to Aigents Core Platform from the org.aigents.nlp.qa 5.5. TBD
  6. TBD

References: https://blog.singularitynet.io/an-understandable-language-processing-3848f7560271