fujaba / org.fujaba.graphengine

This project's aim is to build a graph engine, that is able to build and compare graphs - and to match patterns and apply actions on the graph, to effectively use it as a graph-transformation framework.
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

problem with checks for isomorphism #22

Closed hoechp closed 7 years ago

hoechp commented 7 years ago

I've found some graphs, where the check for isomorphism isn't that effective. The graph I first used to compare performance to SDMLib went pretty good, and this project was about 20x faster than SDMLib. After using an external expression library for attribute comparison, it is at least about 10x faster. But now I have a graph, which I didn't test in SDMLib, but it takes unreasonably long to do the check for isomorphism.

        Graph carGraph = new Graph();

        // build car 'A' with 4 black wheels:
        Node carA = new Node().setAttribute("type", "Car");
        Node wheelA1 = new Node().setAttribute("type", "Wheel").setAttribute("color", "black");
        Node wheelA2 = new Node().setAttribute("type", "Wheel").setAttribute("color", "black");
        Node wheelA3 = new Node().setAttribute("type", "Wheel").setAttribute("color", "black");
        Node wheelA4 = new Node().setAttribute("type", "Wheel").setAttribute("color", "black");
        carGraph.addNode(carA).addNode(wheelA1).addNode(wheelA2).addNode(wheelA3).addNode(wheelA4);
        carA.addEdge("has", wheelA1).addEdge("has", wheelA2).addEdge("has", wheelA3).addEdge("has", wheelA4);

        // build a car 'B' with 3 black wheels and 1 blue wheel:
        Node carB = new Node().setAttribute("type", "Car");
        Node wheelB1 = new Node().setAttribute("type", "Wheel").setAttribute("color", "blue");
        Node wheelB2 = new Node().setAttribute("type", "Wheel").setAttribute("color", "black");
        Node wheelB3 = new Node().setAttribute("type", "Wheel").setAttribute("color", "black");
        Node wheelB4 = new Node().setAttribute("type", "Wheel").setAttribute("color", "black");
        carGraph.addNode(carB).addNode(wheelB1).addNode(wheelB2).addNode(wheelB3).addNode(wheelB4);
        carB.addEdge("has", wheelB1).addEdge("has", wheelB2).addEdge("has", wheelB3).addEdge("has", wheelB4);
        carA.addEdge("next", carB);

        // build a car 'C' with 4 blue wheels:
        Node carC = new Node().setAttribute("type", "Car");
        Node wheelC1 = new Node().setAttribute("type", "Wheel").setAttribute("color", "blue");
        Node wheelC2 = new Node().setAttribute("type", "Wheel").setAttribute("color", "blue");
        Node wheelC3 = new Node().setAttribute("type", "Wheel").setAttribute("color", "blue");
        Node wheelC4 = new Node().setAttribute("type", "Wheel").setAttribute("color", "blue");
        carGraph.addNode(carC).addNode(wheelC1).addNode(wheelC2).addNode(wheelC3).addNode(wheelC4);
        carC.addEdge("has", wheelC1).addEdge("has", wheelC2).addEdge("has", wheelC3).addEdge("has", wheelC4);
        carB.addEdge("next", carC);

and the PatternGraph:

        // build a pattern that says 'car without a blue wheel':
        PatternGraph carWithoutBlueWheel = new PatternGraph();
        PatternNode car = new PatternNode()
                .setAttributeMatchExpression("#{type} == 'Car'");
        PatternNode wheel = new PatternNode()
                .setAction("!=").setAttributeMatchExpression("#{type} == 'Wheel' && #{color} == 'blue'");
        carWithoutBlueWheel.addPatternNode(car).addPatternNode(wheel);
        car.addPatternEdge(new PatternEdge().setSource(car).setName("has").setTarget(wheel));

(the PatternGraph doesn't really play a role here, but it's here just to complete the example)

Checking if the graph is isomorph to a copy of it takes more than a minute or two, which is totally unreasonable for such a small graph.

I may have to rethink my whole concept of checks for isomorphism, or at least find a way to, once again, drastically increase performance of it.

I have three totally different approaches to the check:

  1. the 'combinatorics problem' approach: this is what I first came up with, and what I'm currently using. All valid combinations of all loosely fitted candidates are checked for the remaining details.
  2. the 'search problem' approach: this is a fairly reasonable and intuitive approach, where you would do a depth-first search with backtracking to find nodes that match all details of the graph to compare with.
  3. the 'sorting problem' approach: this is a kinda different but still somehow intuitive approach, where you would try to find a 'normal form' in which two isomorph graphs would be serialized to the exact same string. For this, you would at least have a way of reliably sort nodes by any measure. If you however think this through, it is not less complicated than the other approaches

All three approaches face the same kind of 'combinatorial explosion', that make the check become quite hard to complete in reasonable time, as the graph grows - strongly depending on 'how hard' the graph is to compare.

As I already put some time into the first approach, I think I should additionally try to implement one or even both of the other approaches, so I can compare them with one another. These things ain't trivial however and I'm running out of time, because this project is about to be finalized for now in the next one or two weeks. This will most probably be one of the last big issues I'll be facing here.

hoechp commented 7 years ago

implement multiple different ways to do the check:

hoechp commented 7 years ago

In an about 3 hour long sunday night hackathon, I implemented a whole new check for isomorphism based on the 'search problem' approach. It seems to work and it's about 50x faster on the graph in question, but it needs further testing.

hoechp commented 7 years ago

Now I have all 3 approaches up and working. My initial 'combinatorial problem' approach seems to be always worse than the new 'search problem'/'CSP' approach, that pretty much is 100% solving a constraint satisfaction problem after limiting the variable domains as much as possible before starting the algorithm.

The new 'sorting problem' approach is very interesting, too. Since it yields a normal form for serializing a graph in a way, where two isomorph graphs will always have the same serialization. Not only 'academically valuable', this allows for easy repetative checks for isomorphism. Like when you have 100 Graphs, and you want to check them all against each other for isomorphism, you can just 'normalize' and serialize those, to then quickly do the check. Even if the normalization takes a while, it will eventually be worthwhile, if enough checks will be done with the obtained serialization.

still TODO:

hoechp commented 7 years ago

found a bug in the search/CSP approach, where multiple nodes of one graph can be mapped to the same node in the other graph. and it's somehow harder to fix than it sounds, especially trying to not mess up performance

hoechp commented 7 years ago

I found that my search/CSP approach wasn't working out the way I wanted. After reinventing it, what was left of it, was a dramatically enhanced version of my initial combinatorial approach. So I did just implement the new version and it seems to be much better than any previous implementation. It therefore unifies my CSP and combinatorial approach, so im left with just this new approach, you may also call it CSP-based rather than combinatorial - and then there's still the sort-based approach, that has a pretty bad performance unless, probably, you use it for hundrets or thousands of checks for isomorphism per 'normalized' graph.

The old CSP approach was buggy and will be deleted. I'm tending to keep the initial combinatorial approach though, even know it now is pretty much 'depricated'. But I'll keep the code in my project nevertheless.