GEGlobalResearch / DARPA-ASKE-TA1

ANSWER Project to demonstrate knowledge-driven extraction of scientific models from code and texts
Other
7 stars 5 forks source link

SGH-784: Query expression processing, Dialog version of SADL issue 874 #137

Closed crapo closed 2 years ago

crapo commented 2 years ago

PropOfSubject expressions can be nested. The fix to SADL issue 874 handled the return from JBSMP.processExpression(PropOfSubject) when the return value is an Object[] with the 2nd element being the "rest" of the processing results. JBDMP needs to handle this case as well.

crapo commented 2 years ago

@agabaldon , pushed branch SGH-874 with some post processing after what was line 888 of JBDMP. This generates a correct Junction, I believe, although the processing after doesn't seem right as it generates two pseudo rules with what should be one. I'm a little fuzzy on what is supposed to happen. Perhaps you can help see where that goes wrong?

agabaldon commented 2 years ago

On lines 1024-1025, if the subject is a variable, the variable is replaced by the type. For example, the triple
rdf(v0, propchains:partID, "123") is replaced by rdf(propchains:Part, propchains:partID, "123")

Instead I think the triple rdf(v0, rdf:Type, propchains:Part) should be added to thenObjs along with the original rdf(v0, propchains:partID, "123").

But then we get three rules so that's also not right. Since JBDIP interprets each rule as a separate query, I suppose only one rule should be generated form this query.

crapo commented 2 years ago

That makes sense. It would be nice to use more of the SADL code and less custom code so that it only needs to be made to work correctly in one place. Maybe I'll take a look at that possibility.

crapo commented 2 years ago

@agabaldon , I took a look at what JBDMP is doing versus what JBSMP is doing. JBDMP should be using the returned Object[2] the same way as JBSMP and then should use the IntermediateFormTranslator to expand the proxy nodes like JBSMP does. I don't want to change something you're working on, but if you want me to try to bring JBDMP more inline with what JBSMP does, using common code, let me know.

agabaldon commented 2 years ago

Please go ahead. In the meantime, I'm using a different version of the query to continue working on the query processing part.

crapo commented 2 years ago

@agabaldon , I have made a couple of changes to SADL, pushed to branch GH-874b. I have pushed changes to Dialog to use the JBSMP.processAsk, pushed to branch SGH-874. I believe the processing of the thenExpr (~line 906) is working as expected. However, in this case ("what is the temperature of processing of (a Part partID "123")?"), whenLst is null. I made a change in the caller so that it didn't create a whenLst with a null list element. I'm not sure what's supposed to happen in this case for rule construction, but maybe you do?

agabaldon commented 2 years ago

Right. I noticed that. I was even going to change line 867 of JBDMP to add the when to the whenLst only if when is not null. But then I noticed, as you say, that if whenLst is empty then a rule is not created on lines 1152-1166. I think a rule should be created there whether the when is null or not. There's another issue I think if we allow compare queries without a when part. For example if we say "compare thrust of a CF6 and thrust of a F100". In this case two rules should be created for the two "thens". But whens are null in both cases, and so line 1022 will map null to both 'thens' and lose the first then. We probably assumed that 'when' would never be empty.

crapo commented 2 years ago

@agabaldon , I may have not pushed to SADL branch GH-874b yesterday like I thought I did. Now I have and created a PR to merge into development.

agabaldon commented 2 years ago

I checked out SGH-874 and can't build it. I get this error:

[ERROR] Object thenObj = processAsk(thenExpr); // processExpression(thenExpr); // do this first for article checking conformance [ERROR] ^^^^^^^^^^ [ERROR] The method processAsk(Expression) in the type JenaBasedSadlModelProcessor is not applicable for the arguments (EObject)

Doesn't make sense since processAsk is clearly declared as taking an EObject as argument. Any suggestions?

agabaldon commented 2 years ago

Tried mvn clean then mvn package instead of mvn clean package and it worked.

crapo commented 2 years ago

Good. There was a change of type that must not have propagated properly.

agabaldon commented 2 years ago

Ok, I'm trying to modify JBDMP to handle the output from processAsk.

What I'm looking at is trying to distinguish between thenObj (post processing) for query:

what is the temperature of processing of part1?

which produces thenObj and(rdf(propchains:part1, propchains:processing, v6), rdf(v6, propchains:temperature, v7))

and for query:

compare thrust of a CF6 and a F100.

which produces thenObj: and(rdf(v0, hypersonicsV2:thrust, v35), v1).

They are both Junctions. I'm thinking of making thenObj a list for the compare, and keep the Junction for the what-is. Any suggestions for better doing this? The value returned by pocessAsk is a Query with "select ..." for what-is and a Query w/o "select ..." for compare.

crapo commented 2 years ago

@agabaldon , this brings back memories of why there was special processing in the JBDMP. The compare thenObj has v0 and v1, which represent "a CF6" and "a F100". I think they must ultimately be recognized as implying two queries: 1) rdf(v0, hypersonicsV2:thrust, v35) 2) rdf(v1, hypersonicsV2:thrust, v36)

I'd forgotten that we had this scenario to handle when consolidating under the processAsk. I'm thinking that we perhaps can separate the two parts of the compare in the beginning, when we know in the JBDMP that there are two things to compare. Maybe find the conjunction, and realize that that gives rise to the two parts, regardless of where in the chain it occurs? In other words, get the same thing if the compare had been "compare the trust of a CF6 and the thrust of an F100." Does that make sense?

crapo commented 2 years ago

@agabaldon , the result of the unabbreviated form ("compare the trust of a CF6 and the thrust of an F100.") is what we would expect: "and(rdf(v0, hypersonicsV2:thrust, v35), rdf(v1, hypersonicsV2:thrust, v36))", We could recognize what is being compared before processing, identifying the two thenObjs to be processed. I think it might work something like this.

We know we must be comparing two or more things of the same type. Find the conjunction and the LHS and RHS If there is implication, then LHS should be the more complete property chain (you wouldn't say "compare a CF6 and the thrust of an F100") If necessary, add properties to the property chain of the RHS to make it compatible with the LHS. The process the two or more parts of the conjunction(s) to obtain the parts of the comparison.

agabaldon commented 2 years ago

Yes, the expansion of the abbreviated form for compare is there and can still be used. We have to distinguish between a Junction that represents a what-is with property chains or a compare. Before, we didn't deal with property chains, so we would get a TripleElement for what-is and a Junction for compare, so it was easy to distinguish. So, are you saying that to determine if it is a compare, look at the triples in the Junction and check if the triples (or in general, the last triple in all the property chains) have the same property?

agabaldon commented 2 years ago

"In other words, get the same thing if the compare had been "compare the trust of a CF6 and the thrust of an F100." Does that make sense?"

Get the same thing from processAsk? Not sure what you mean.

crapo commented 2 years ago

@agabaldon , I'm suggesting that we need to differentiate in the JBDMP when we know, explicitly, that this is a compare statement. Divide it into its parts, 2 or more, and process them one at a time, filling in any missing property chain elements on the LHS, which can probably be most easily done after they are processed by processAsk. Method processAsk is in JBSMP and is not the place to determine that something is a Dialog compare, which doesn't exist in SADL. Of course it could be overridden in JBDMP, but better to use the fact that it is a compare statement upstream.

agabaldon commented 2 years ago

Right, we're on the same page. The question is what's the best way to distinguish between what-is with prop chains and compare. Both queries go through processAsk, even though compare doesn't exist in SADL. When a compare is unabbreviated, processAsk returns a standard query "select v0 v1... where and(....". And when the compare is abbreviated it returns a Query with only the where clause "and(...)".

Initially I thought a way to distinguish between what-is and compare could be by checking if the query returned by processAsk has the "select" part or not. But then I tried an unabbreviated example of compare and saw that in this case the query from processAsk does have the "select" part so that way to distinguish it from what-is doesn't work. So it seems that the only way to distinguish between them is by checking whether the pattern in the where clause has multiple prop chains that end in the same property (i.e. compare after expanding if abbreviated) or not. But I was wondering if processAsk could return something different for compare instead of the same structure as for standard queries. If not, then I will implement this where-clause pattern analysis.

crapo commented 2 years ago

@agabaldon , let me try to be more explicit and perhaps you can explain if my thinking is wrong.

  1. The only thing special about a compare statement is that it requires 2 or more queries to retrieve the data for the 2 or more scenarios that are to be compared. That is why compare composes multiple pseudo rules to contain the necessary triple patterns for the multiple queries. (In retrospect, perhaps these should have been passed as Queries, not pseudo Rules.)
  2. The best place to provide special handling of compare statements is in the JBDMP, which "knows" about compare and can implement the special handling. Then there is no need for the JBSMP's processAsk to do anything that it doesn't need to do for other queries. This special handling is to call processAsk for each of the relevant grammar patterns required to generate the triple patterns necessary to do comparison.
  3. In the case of abbreviated patterns in a compare statement, it should not be up to processAsk to do special handling to "understand" some conflation of the 2 or more queries. Rather, the JBDMP should separate the scenarios and call processAsk for each one. If there are missing property chain elements, these can actually be filled in (I think) either before or after the call to processAsk. I think that it is probably easier to fill them in after because it is much easier to construct triple patterns from the SADL graph pattern classes, e.g., NamedNodes, VariableNodes, TripleElement, than from the EObjects from the Xtext parse tree, although the latter is certainly possible. Take, for example, the statement "compare the trust of a CF6 and an F100." A naive separation would result in two calls to processAsk. i) "thrust of a CF6" returning something like "rdf(v0, thrust, v1)", where v0 is a VariableNode of type CF6, and ii) "an F100", which returns something like "v2" where v2 is a VariableNode of type F100. Since the JBDMP knows this is a compare, and that the things to be compared must be comparable, and using the structure of the compare statement where the "and" is between CF6 and F100, it should be pretty easy to expand this return into "rdf(v2, thrust, v3)". This becomes exactly the two returned sets of triple patterns that would have been produced by calls to processAsk for the unabbreviated compare where the "and" is between complete triple patterns statements.

I don't know off the top of my head why the abbreviated version return from processAsk does not have the keyword "select", and this might be considered a bug to be corrected, and I don't know for sure that processAsk(an F100) would return "v2", a VariableNode of type F100, but if not this could also be considered a bug to be corrected. Or maybe the JBDMP should just call processExpression(Declaration) if the second part is just a Declaration, as opposed to a PropOfSubject. These details can be worked out and do not, I think, negate the benefits of the approach detailed above.

agabaldon commented 2 years ago

All of 1-3 make sense. But my question is not how to handle compare. It is how to tell if the EObject thenExpr is a compare or a what-is. I was asking if it made sense for processAsk(thenExpr) to return something that indicates it is a compare when thenExpr is a compare. You are saying that no, processAsk will return a "select" or an "ask" because compare is a dialog construct, which makes sense. So either we do something with thenExpr before calling processAsk, or after, which is fine. As I understand, to "look" at thenExpr we first need to translate it into an rdf pattern, expand the nodes, etc., so there's not much we can do before calling processAsk (or alternatively processExpr + postProcessTranslateResult + expandProxyNodes, which is really the same as processAsk). So, the conclusion is that to determine if the query is a compare, I need to call processAsk and then do the where clause pattern analysis I mentioned earlier.

I think there is indeed a bug in processAsk that makes it fail to add the "select" part. It's not really a bug but it doesn't deal with abbreviated patterns from compare. I don't remember if that can happen with other statements. But when the pattern comes from abbreviated compare, the call to getSelectVariables(pattern) in processAsk returns no nodes and then it makes the query an "ask" instead of a "select". If it only happens with compare, then probably don't need a fix.

crapo commented 2 years ago

Oh, because the processing is in JBDMP.processStatement(CompareStatement element), which ends up calling whenAndThenToCookedRules. Since this is only in JBDMP, you can just add a flag if needed to tell it whether it was called from a CompareStatement or from a WhatIsStatement. Make it explicit--no need to guess based on content.

agabaldon commented 2 years ago

I knew there was a simple way. :-)

crapo commented 2 years ago

Actually, you already have what is needed to tell. The EObject passed in to the method whenAndThenToCookedRules (or any other method) as the context used for adding markers can be used to identify what statement one is in using EcoreUtil2.getContainerOfType(theEObj, CompareStatement.class). This will return true if theEObj is contained by a CompareStatement.

agabaldon commented 2 years ago

Thanks, yes, that's better. EObjects are still a dark region for me.