Wolfgang-Schuetzelhofer / jcypher

Java access to Neo4J graph databases at multiple levels of abstraction
Apache License 2.0
86 stars 15 forks source link

problem with the result table structure #24

Closed mpetris closed 8 years ago

mpetris commented 8 years ago

Hi Wolfgang,

I have a problem within the getRelations method of ResultHandler (see my commit). In the original version relations don't get added more than once. This breaks the table structure which I can access with the JcQueryResult.resultOf methods. The indices become wrong if the Lists returned by resultOf don't have the same length. In my use case I have an optional match like MATCH (n:Text) OPTIONAL MATCH (n)-[r:hasAudio]->(a:Audio) return n, r, a; Some "Text" labelled nodes have optional relations of type "hasAudio" to "Audio" labelled nodes and I want NULL values for r and a if the relationship doesn't exist. Let's suppose I have Text nodes n1, n2, n3 and no relationship yet. The result without my commit will be:

in Neo4J (in Rows view): n1, null, null n2, null, null n3, null, null

in JCypher with JcQueryResult.resultOf: List for Text: n1, n2, n3 List for hasAudio: null List for Audio: null, null, null Note that the list for Audio nodes does contain the expected three null entries.

I probably don't have the overview of all possible use cases but as far as I can see duplicate relations should be allowed at this level.

Thanks for looking into this!

Best,

Marco

Wolfgang-Schuetzelhofer commented 8 years ago

Hi Marco,

After looking into the propblem: The table structure should be of no concern at this point. With JcQueryResult.resultOf you retrieve elements of the graph. A null value simply represents a nonexistent element. Of course it would be semantically clearer, if null values would be removed in resulting lists of graph elements, so that in your example the lists for hasAudio and Audio would be empty. That would clearly state, that no matching elements exist in the graph.

I will refactor the code, so that this behaviour will be available with the next release of JCypher.

Best regards, Wolfgang

mpetris commented 8 years ago

Hi Wolfgang,

just to be clear: I think it would actually be more clear and probably more correct if null values would NOT be removed. I would like the JcQueryResult to be as close as possible to the Neo4J result. In JCypher right now duplicates get removed for relation columns but not for node columns. I argue for keeping the duplicates (in my case null values) for relations columns, too. Just as you have it in the Neo4J result. You say "A null value simply represents a nonexistent element.". Right, and if you remove the duplicates from a relation column you actually remove one or more representation for nonexistent elements. I then simply cannot tell anymore which element in a column was present.

So please do not remove duplicates from the relation columns :-)

Thanks and best regards,

Marco

Am 04.07.2016 um 09:16 schrieb Wolfgang Schuetzelhofer:

Hi Marco,

After looking into the propblem: The table structure should be of no concern at this point. With JcQueryResult.resultOf you retrieve elements of the graph. A null value simply represents a nonexistent element. Of course it would be semantically clearer, if null values would be removed in resulting lists of graph elements, so that in your example the lists for hasAudio and Audio would be empty. That would clearly state, that no matching elements exist in the graph.

I will refactor the code, so that this behaviour will be available with the next release of JCypher.

Best regards, Wolfgang

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Wolfgang-Schuetzelhofer/jcypher/pull/24#issuecomment-230221557, or mute the thread https://github.com/notifications/unsubscribe/ABD3vzd5Q30Q6qWO10Iu8Ry_lU-w49aaks5qSLNXgaJpZM4JDOiz.

Wolfgang-Schuetzelhofer commented 8 years ago

Hi Marco,

in the next Release of JCypher there will be a boolean switch: ResultSettings.includeNullValuesAndDuplicates

By default it is switched to false, but you can turn it to true to get the behaviour you are looking for.

Nevertheless my opinion is that null values and duplicates should be removed from the result (and that will be the default behaviour). JCypher queries provide results containing elements (existing elements) of the graph. You are not provided with a table structure (or row view) as a query result. If (like maybe in your example) you want to know which start node, relation, and end node belong to a distinct result (row), you would start with the relation and call getStartNode() and getEndNode() on the relation instead on relying on a row-like relationship between the distinct result lists.

So to summarize: JCypher does not provide a row view as a query result. But with the upper mentioned switch you can like tell JCypher to simulate such a row-like relationship between result lists.

I hope that helps. Best regards, Wolfgang

mpetris commented 8 years ago

Hi Wolfgang,

thanks, the switch is excellent.

But following your explanation, how do I hop over a chain of relationships within a path? I was expecting somethig like GrNode endNode = getEndNode() ; GrRelationship relationship = endNode.getRelationship(...); if (relationship != null) { } and so on. But GrNode doesn't let me navigate further. I'm obviously misunderstanding something here.

Thanks for any hints on this!

Best, Marco

Am 08.07.2016 um 10:21 schrieb Wolfgang Schuetzelhofer:

Hi Marco,

in the next Release of JCypher there will be a boolean switch: ResultSettings.includeNullValuesAndDuplicates

By default it is switched to false, but you can turn it to true to get the behaviour you are looking for.

Nevertheless my opinion is that null values and duplicates should be removed from the result (and that will be the default behaviour). JCypher queries provide results containing elements (existing elements) of the graph. You are not provided with a table structure (or row view) as a query result. If (like maybe in your example) you want to know which start node, relation, and end node belong to a distinct result (row), you would start with the relation and call getStartNode() and getEndNode() on the relation instead on relying on a row-like relationship between the distinct result lists.

So to summarize: JCypher does not provide a row view as a query result. But with the upper mentioned switch you can like tell JCypher to simulate such a row-like relationship between result lists.

I hope that helps. Best regards, Wolfgang

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Wolfgang-Schuetzelhofer/jcypher/pull/24#issuecomment-231302769, or mute the thread https://github.com/notifications/unsubscribe/ABD3v1h9V4xD9bVJouhH975ZpogZpm5qks5qTgh_gaJpZM4JDOiz.

mpetris commented 8 years ago

Oh and by the way I was hoping to use JcPath and GrPath for that, but I didn't manage to include a MATCH and an OPTIONAL MATCH within a single path. So in that case I still have the problem that I need to weave several paths together then. Am I supposed to do this by matching start and end nodes of the paths myself? Thanks for the help! Best, Marco

Wolfgang-Schuetzelhofer commented 8 years ago

Hi Marco,

with the path at hand (a GrPath object), you call getRelations() on the path and get an ordered list of GrRelation objects. The relations in the list are exactly in the order in which they appear in the path. (That is: If you take one relation and call getEndNode() on it, that will return the same node as when calling getStartNode() on the next relation in the list).

For the second question, maybe you can give me an example or try to exlain what you want to achieve with the query.

Best regards, Wolfgang

mpetris commented 8 years ago

Hi Wolfgang, thanks for the detailed explanation for my problem. That helps a lot.

Here is an example for my second question, concerning the two path objects.

JcPath p1 = new JcPath("p1"); JcPath p2 = new JcPath("p2"); JcNode textQueryNode = new JcNode("t"); IClause[] clauses = new IClause[] { MATCH .path(p1) .node() .label("Chapter") .relation() .type("hasTexts").out() .node(textQueryNode) .label("Text"), OPTIONAL_MATCH .path(p2) .node(textQueryNode) .relation() .type("hasAudio").out() .node() .label("Audio"), RETURN.value(p1), RETURN.value(p2) }; ` JcQuery query = new JcQuery(); query.setClauses(clauses); try (RestorableIDBAccess idbAccess = getIDBAccess()) { JcQueryResult result = idbAccess.execute(query); List gp1 = result.resultOf(p1); List gp2 = result.resultOf(p2); }`

I want to get all "Chapter" nodes with their "Texts" and optionally the "Audio" to a "Text" where present. When I execute this code I get a ClassCastException when trying to aquire gp2 (see further below). That is probably a minor problem, my question is rather on the conceptional level. So assuming for a moment this call to resultOf succeded without exceptions, I would still somehow bring together gp1 and gp2. Do I need to manually match each endnode of p1 with the startnodes of p2? Assuming null values present for non existing entries in gp2 I could match by index, seeing the two lists as columns of a table. But as I understand you that's not what it was intended for. So I need some explanation on how to use it.

Thanks a lot for helping me out!

The exception is probably a minor problem: java.lang.ClassCastException: javax.json.JsonValue$1 cannot be cast to javax.json.JsonObject at iot.jcypher.query.result.util.ResultHandler.getRestObject(ResultHandler.java:631) at iot.jcypher.query.result.util.ResultHandler.getPathObject(ResultHandler.java:676) at iot.jcypher.query.result.util.ResultHandler.getPaths(ResultHandler.java:271) at iot.jcypher.query.JcQueryResult.resultOf(JcQueryResult.java:85) I'm working with Neo4J 3.0.1 and the latest JCypher master branch. I'm aware of the fact that Neo4J 3.0.1 isn't officially supported yet.

Best,

Marco

Wolfgang-Schuetzelhofer commented 8 years ago

Hi Marco,

find below a code snippet that hopefully supports in solving your problem. I use two queries which are performed within the same request. As a result you get two lists of paths: One contains the paths with attached 'Audio' nodes, the oher one contains only paths with no attached 'Audio' nodes. (You could of course concatenate both lists into one).

The code snippet first inserts some sample data into the db. If you have instantiated an IDBAccess object you can straight forward execute the code.

public void test_14() {
        IClause[] clauses;
        JcQuery q;
        JcQueryResult result;
        String cypher;
        dbAccess.clearDatabase();

        // add some sample data
        clauses = new IClause[]{
                CREATE.node().label("Chapter").property("chapter").value("Chapter 1")
                    .relation().out().type("hasTexts")
                    .node().label("Text").property("Text").value("Text 1")
                    .relation().out().type("hasAudio")
                    .node().label("Audio").property("Audio").value("Audio 1"),
                CREATE.node().label("Chapter").property("chapter").value("Chapter 2")
                    .relation().out().type("hasTexts")
                    .node().label("Text").property("Text").value("Text 2")
                    .relation().out().type("hasAudio")
                    .node().label("Audio").property("Audio").value("Audio 2"),
                CREATE.node().label("Chapter").property("chapter").value("Chapter 3")
                    .relation().out().type("hasTexts")
                    .node().label("Text").property("Text").value("Text 3"),
        };
        q = new JcQuery();
        q.setClauses(clauses);
        cypher = print(clauses, Format.PRETTY_1);
        result = dbAccess.execute(q);

        // two queries in a single request
        JcPath p1 = new JcPath("p1");
        JcPath p2 = new JcPath("p2");
        JcNode n1 = new JcNode("n1");
        q = new JcQuery();
        q.setClauses(new IClause[]{
                MATCH.path(p1).node().label("Chapter")
                    .relation().type("hasTexts").out()
                    .node().label("Text")
                    .relation().type("hasAudio").out()
                    .node().label("Audio"),
                RETURN.value(p1)
            });
        JcQuery q2 = new JcQuery();
        q2.setClauses(new IClause[]{
                MATCH.path(p2).node().label("Chapter")
                    .relation().type("hasTexts").out()
                    .node(n1).label("Text"),
                // make sure only paths with no attached Audio nodes are returned
                WHERE.NOT().existsPattern(X.node(n1).relation().type("hasAudio").out().node().label("Audio")),
                RETURN.value(p2)
            });
        List<JcQuery> queries = new ArrayList<JcQuery>();
        queries.add(q);
        queries.add(q2);
               // execute in a single request
        List<JcQueryResult> results = dbAccess.execute(queries);

        List<GrPath> p1res = results.get(0).resultOf(p1);
        int idx = 0;
        System.out.println("P1---------------------------");
        for (GrPath p : p1res) {
            System.out.println("Length: " + p.getLength());
            System.out.println("Startnode: " + p.getStartNode().getProperty("chapter").getValue().toString());
            idx++;
        }
        List<GrPath> p2res = results.get(1).resultOf(p2);
        idx = 0;
        System.out.println("P2---------------------------");
        for (GrPath p : p2res) {
            System.out.println("Length: " + p.getLength());
            System.out.println("Startnode: " + p.getStartNode().getProperty("chapter").getValue().toString());
            idx++;
        }

        return;
    }

Best regards, Wolfgang

mpetris commented 8 years ago

Hi Wolfgang,

thanks for the workaround code. This works just fine. From a performance point of view I wanted to stick to the OPTIONAL_MATCH though. I managed to fix the class cast problem and will open a separate pull request for it. Together with the new ResultSettings flag everything runs perfect now. Thanks again for your help! Best,

Marco