Generate Automatic User Interviews from Workspace and Query

Gauntlet173 commented 2 years ago

It should be possible to generate a simple interview-style web interface based on a workspace and a specific query, in the manner of L4-Docassemble or DataLex. Initially, it might be done by generating a data structure from the category and attribute declarations, using a naive order determined by the order in which the elements appear in the code. The reasoner's abduction features should let us get relevance for a given query, and we can pose one question at a time to the user until the query can be answered.

Initially, it would make sense to limit the interview to generating one model for the query, and returning it. But we could make it possible to customize that, later.

This would likely be a view on the workspace or query object with JavaScript code that repeatedly queries the API to get questions.

This may also require adding an ability to query the data structure through the API.

Gauntlet173 commented 2 years ago

The client needs to be able to query the ontology of the rule, so we need an API endpoint for that, and we probably need to change the code generation on the ontology blocks so that information about the ontological structure can be accessed by querying the reasoner, instead of by analysing the workspaces. Simplest thing is a dictionary of categories with nested attributes with types. Once we have that we would be able to generate the queries to find out what objects in each of those categories are known to exist, and what attributes those objects have.

Now we want to start a loop where the client asks the user for information, that information is added to the knowledge base, the client determines whether or not the question is answered, if it is answered, the client displays the answer, and if it is not, the client gets a list of still-relevant questions to pose to the user. What constitutes "the question being answered" can be defined as "there is a stable model with no presumptions", or "there are no more possible stable models without presumptions", or "there are no more possible stable models for this or the opposite query," if we would like to have all the possible explanations of the negation, also.

This can be done by maintaining a list of presumptions that the client generated, and then determining if any of those presumptions appear in the explanation for any stable model. So we don't really need more than the ability to add presumptions to the query to make that work.

If we are collecting more than one of something, we need to track on the client side whether the user has said "that's all", and as long as the list is potentially open, leave the contents of that predicate abducible. Actually, we might be able to create an interview "collected" predicate that uses the names of predicates in the code, and add a rule that if a leaf input is not listed as "collected" in a query, then it is listed as #abducible in the code. This would eliminate the need for the client to keep track of what it is abducing, and instead allow it to say when a given input has been exhausted.

If there are no stable models without user-facing presumptions, we need the list of all of the models returned, and get all of the inputs that are relevant. This is going to require a complicated de-coding of the contents of each model. It's probably also going to require us to add the model to the output.

[x] Need to start with #140 and #172 to make the JSON sent to the server easier to write for testing.
[x] Need an endpoint that will run a standard ontology query on the code, and return the categories and attributes, indicating which are derivable, which aren't, and which are presumed in the rulebase. This could return a LExSIS file.
[x] Need an endpoint that can be used to query the objects and attributes in a rule, for pre-populating the user interface
[x] Add models to the output of the reasoner
[x] Add a system to detect which leaf nodes in the code are possibly relevant to a test, in the abstract, and generate information to be used in explaining why queries are being asked.
[x] Create a relevance endpoint per test that checks for what predicates (among those possibly relevant to the test) have been closed, and adds abducibility statements for all the leaf predicates not yet closed, then runs the test.
[ ] demonstrate a multi-step interview script that uses the results of queries to determine relevance with regard to partially-grounded facts (e.g. I need to know this "persons" "birth date", but this other "person's" is irrelevant.)
[ ] generate an interview that asks only relevant questions, explains why the questions are being asked, can differentiate about partially-grounded facts, stops when there is a solid answer, explains the answer, links to the rules, and can answer "why not" when the answer is no.

Gauntlet173 commented 2 years ago

I don't think information can be generated about why a specific question is being asked in advance of the facts being collected, because the same piece of information might be relevant in different circumstances for different reasons. That's going to have to be derived from the models returned from the relevance query.

Gauntlet173 commented 2 years ago

I think that we need to start with something that returns the ontology alone, without information about which of the elements of the ontology can be derived from code. And a LExSIS file would require us to collect additional information from the user about the structure of the data, which might be circular in the Blawx code. So the next step is a test/onto type endpoint that returns the results of a query of categories, a query of attributes, and a query that returns only the objects and attribute values that are stated or can be derived from the information in the codebase.

The output should look something like this.

{
  "categories": [ "person" ],
  "attributes": [
    {
      "category": "person",
      "attribute": "favourite_number",
      "type": "number"
   }
  ],
  "objects": [
    { "name": "jason", "category": "person"}
  ],
  "values": [
    { "object": "jason",
      "attribute": "favourite_number",
      "value": 42
    }
  ]
}

Gauntlet173 commented 2 years ago

OK, I have a version in ontology branch with an onto endpoint that returns categories, attributes, the NLG for both, and any defined objects and values. in a format much like the above.

I've decided that trying to get the interface working at the same time I try to get relevance working is unnecessary. We can generate a version of the tool that returns a list of relevant inputs, and for the time being just includes everything, as long as the format is sufficient to be able to specify which factors are relevant later.

So what we need now is an API endpoint that accepts JSON facts, returns any answers, but also returns a useful data structure for relevant inputs.

That should be able to deal with unground predicates, and partially-ground predicates.

So the format might be

{
  "Answers": ...
  "Transcript": ...
  "Relevant Categories": [
    "person", "firm"
  ],
  "Relevant Attributes": [
    {"Attribute": "employee"},
    {"Attribute": "ceo", "Object": "megaCorp"}
}

The idea here is that membership in a category either is still relevant, or isn't. It can't be partially grounded. Attributes we can say all predicates of the type of relevant by naming only the attribute, or we can say that partially grounded is relevant by including one or the other of "Object" and "Value", but not both. "Attribute" is mandatory. Partially grounded should only be reported if the ungrounded version is not relevant.

So in the above example, who the ceo of megaCorp is is relevant, and who the employees of all firms are is relevant, but the ceo of any firm other than megaCorp is not. The existence of additional people and firms is also relevant.

At that point we will have everything we need to generate a "dumb" interface that can repeatedly hit that endpoint until an answer exists. The dumb version should just list all categories and attributes as relevant on every hit, and the interface should stop asking when the user advises there is nothing else to provide.

Gauntlet173 commented 2 years ago

The /onto/ and /interview/ endpoints have been created in the ontology branch. Next is generating a chat bot. Having looked at a few of the platforms out there for this, the structure used by RASA seems most amenable to what we need.

[ ] Create RASA chatbot that is based on a Blawx Ontology of one example, and follows the list-collecting and updating pattern we want, pinging the interview endpoint between each question
[ ] Determine how to automatically generate the configuration for that chatbot from the results of the ontology endpoint, and generalize
[ ] Figure out how to deploy the resulting chatbots in Django
[ ] Figure out how to display the chatbot with an HTML interface, and display explanations in a way similar to the test interface
[ ] Link the chatbot interface to the test interface.

The rest of these features are probably new issues, enhancements.

[ ] Improve relevance determination
[ ] Add "why are you asking this" information
[ ] Add "why not" option
[ ] Add a "forget I said that" option.

Gauntlet173 commented 2 years ago

After struggling with RASA's NLP features for a couple of days, I have decided to take a different strategy. I'm writing a command-line python expert system that will collect information and display answers, and then I'm going to convert that to a JavaScript implementation that does the same thing in a chatbot-like interface. This has eliminated a bunch of problems that I don't really need to solve right now.

Here's the revised list for what I want to accomplish:

[x] Follow the Blawx Ontology
[x] Ping the Interview Endpoint every time data is added to the fact scenario
[x] Stop asking questions when there is a valid answer
[x] Make it capable of dealing with pre-specified facts
[ ] Make it use the NLG information available in the onto endpoint
[x] Have it display the natural language explanation for the first answer it finds
[ ] Have it order the questions, if possible, in such a way as to populate lists before selecting items inside them
[ ] Have it give the user the option of adding to a list that it needs to select from (maybe)
[x] Re-implement it in javascript with an HTML interface, and add it to the Django server with code that sets the test address automatically.

The later capabilities are still:

[x] Improve relevance determination
[ ] Add "why are you asking this" information
[ ] Add "why not" option
[ ] Add a "forget I said that" option.

But in addition to those things, it is nowclear that we need to have an "interview" object in the app that is separate from tests, and which allows the user to specify additional information that we don't currently have in the system, like cardinality, and which categories and attributes are inputs, and which aren't. For example, in the Rock Paper Scissors example, we should be able to pre-populate a game, and two players, specify that exactly one throw should be collected for each player, and have the interview know how to proceed. But that's a bigger problem, because we don't even have an interface for specifying cardinality, or for closing predicates.

Gauntlet173 commented 2 years ago

I have created #238 #237 #236 #235 and #234 to deal with the later issues.

Gauntlet173 commented 2 years ago

The ontology branch now has a working version of the Bot added to the test interface. I've got a demo on YouTube already, and we might push a release if it works OK, but there's more to do before we close this issue.

[ ] Use NLG from the ontology in the Fact displays.
[ ] When a value's type is a category, and the category has been collected, show the options.
[ ] When a value's type is a category, and the category has not been collected, add the value and the object at the same time.
[ ] Make it able to deal with dates as an input
[ ] Make it able to deal with durations as an input
[ ] Make it deal intelligently with booleans ("true" or "yes"), for example.
[x] Test it against all the existing tests
[x] Add demonstrations to the server

Gauntlet173 commented 2 years ago

Automatic interviews are being generated. The remaining work is all interface stuff that is nice, but doesn't demonstrate anything new.

I'm marking this closed and moving everything remaining into new issues to keep it organized.

Lexpedite / blawx

Generate Automatic User Interviews from Workspace and Query #146