Lexpedite / blawx

A user-friendly web-based tool for Rules as Code.
MIT License
102 stars 9 forks source link

Generate Automatic User Interviews from Workspace and Query #146

Closed Gauntlet173 closed 2 years ago

Gauntlet173 commented 2 years ago

It should be possible to generate a simple interview-style web interface based on a workspace and a specific query, in the manner of L4-Docassemble or DataLex. Initially, it might be done by generating a data structure from the category and attribute declarations, using a naive order determined by the order in which the elements appear in the code. The reasoner's abduction features should let us get relevance for a given query, and we can pose one question at a time to the user until the query can be answered.

Initially, it would make sense to limit the interview to generating one model for the query, and returning it. But we could make it possible to customize that, later.

This would likely be a view on the workspace or query object with JavaScript code that repeatedly queries the API to get questions.

This may also require adding an ability to query the data structure through the API.

Gauntlet173 commented 2 years ago

The client needs to be able to query the ontology of the rule, so we need an API endpoint for that, and we probably need to change the code generation on the ontology blocks so that information about the ontological structure can be accessed by querying the reasoner, instead of by analysing the workspaces. Simplest thing is a dictionary of categories with nested attributes with types. Once we have that we would be able to generate the queries to find out what objects in each of those categories are known to exist, and what attributes those objects have.

Now we want to start a loop where the client asks the user for information, that information is added to the knowledge base, the client determines whether or not the question is answered, if it is answered, the client displays the answer, and if it is not, the client gets a list of still-relevant questions to pose to the user. What constitutes "the question being answered" can be defined as "there is a stable model with no presumptions", or "there are no more possible stable models without presumptions", or "there are no more possible stable models for this or the opposite query," if we would like to have all the possible explanations of the negation, also.

This can be done by maintaining a list of presumptions that the client generated, and then determining if any of those presumptions appear in the explanation for any stable model. So we don't really need more than the ability to add presumptions to the query to make that work.

If we are collecting more than one of something, we need to track on the client side whether the user has said "that's all", and as long as the list is potentially open, leave the contents of that predicate abducible. Actually, we might be able to create an interview "collected" predicate that uses the names of predicates in the code, and add a rule that if a leaf input is not listed as "collected" in a query, then it is listed as #abducible in the code. This would eliminate the need for the client to keep track of what it is abducing, and instead allow it to say when a given input has been exhausted.

If there are no stable models without user-facing presumptions, we need the list of all of the models returned, and get all of the inputs that are relevant. This is going to require a complicated de-coding of the contents of each model. It's probably also going to require us to add the model to the output.

Gauntlet173 commented 2 years ago

I don't think information can be generated about why a specific question is being asked in advance of the facts being collected, because the same piece of information might be relevant in different circumstances for different reasons. That's going to have to be derived from the models returned from the relevance query.

Gauntlet173 commented 2 years ago

I think that we need to start with something that returns the ontology alone, without information about which of the elements of the ontology can be derived from code. And a LExSIS file would require us to collect additional information from the user about the structure of the data, which might be circular in the Blawx code. So the next step is a test/onto type endpoint that returns the results of a query of categories, a query of attributes, and a query that returns only the objects and attribute values that are stated or can be derived from the information in the codebase.

The output should look something like this.

{
  "categories": [ "person" ],
  "attributes": [
    {
      "category": "person",
      "attribute": "favourite_number",
      "type": "number"
   }
  ],
  "objects": [
    { "name": "jason", "category": "person"}
  ],
  "values": [
    { "object": "jason",
      "attribute": "favourite_number",
      "value": 42
    }
  ]
}
Gauntlet173 commented 2 years ago

OK, I have a version in ontology branch with an onto endpoint that returns categories, attributes, the NLG for both, and any defined objects and values. in a format much like the above.

I've decided that trying to get the interface working at the same time I try to get relevance working is unnecessary. We can generate a version of the tool that returns a list of relevant inputs, and for the time being just includes everything, as long as the format is sufficient to be able to specify which factors are relevant later.

So what we need now is an API endpoint that accepts JSON facts, returns any answers, but also returns a useful data structure for relevant inputs.

That should be able to deal with unground predicates, and partially-ground predicates.

So the format might be

{
  "Answers": ...
  "Transcript": ...
  "Relevant Categories": [
    "person", "firm"
  ],
  "Relevant Attributes": [
    {"Attribute": "employee"},
    {"Attribute": "ceo", "Object": "megaCorp"}
}

The idea here is that membership in a category either is still relevant, or isn't. It can't be partially grounded. Attributes we can say all predicates of the type of relevant by naming only the attribute, or we can say that partially grounded is relevant by including one or the other of "Object" and "Value", but not both. "Attribute" is mandatory. Partially grounded should only be reported if the ungrounded version is not relevant.

So in the above example, who the ceo of megaCorp is is relevant, and who the employees of all firms are is relevant, but the ceo of any firm other than megaCorp is not. The existence of additional people and firms is also relevant.

At that point we will have everything we need to generate a "dumb" interface that can repeatedly hit that endpoint until an answer exists. The dumb version should just list all categories and attributes as relevant on every hit, and the interface should stop asking when the user advises there is nothing else to provide.

Gauntlet173 commented 2 years ago

The /onto/ and /interview/ endpoints have been created in the ontology branch. Next is generating a chat bot. Having looked at a few of the platforms out there for this, the structure used by RASA seems most amenable to what we need.

The rest of these features are probably new issues, enhancements.

Gauntlet173 commented 2 years ago

After struggling with RASA's NLP features for a couple of days, I have decided to take a different strategy. I'm writing a command-line python expert system that will collect information and display answers, and then I'm going to convert that to a JavaScript implementation that does the same thing in a chatbot-like interface. This has eliminated a bunch of problems that I don't really need to solve right now.

Here's the revised list for what I want to accomplish:

The later capabilities are still:

But in addition to those things, it is nowclear that we need to have an "interview" object in the app that is separate from tests, and which allows the user to specify additional information that we don't currently have in the system, like cardinality, and which categories and attributes are inputs, and which aren't. For example, in the Rock Paper Scissors example, we should be able to pre-populate a game, and two players, specify that exactly one throw should be collected for each player, and have the interview know how to proceed. But that's a bigger problem, because we don't even have an interface for specifying cardinality, or for closing predicates.

Gauntlet173 commented 2 years ago

I have created #238 #237 #236 #235 and #234 to deal with the later issues.

Gauntlet173 commented 2 years ago

The ontology branch now has a working version of the Bot added to the test interface. I've got a demo on YouTube already, and we might push a release if it works OK, but there's more to do before we close this issue.

Gauntlet173 commented 2 years ago

Automatic interviews are being generated. The remaining work is all interface stuff that is nice, but doesn't demonstrate anything new.

I'm marking this closed and moving everything remaining into new issues to keep it organized.