boostorg / graph

Boost.org graph module
http://boost.org/libs/graph
325 stars 208 forks source link

Tree class in BGL #123

Open mkatliar opened 6 years ago

mkatliar commented 6 years ago

Is there a class for trees in BGL?

anadon commented 6 years ago

Usually, you just have an acyclic graph, pick a vertex as your root, and use that as a tree. The reason why this usually isn't done with a full fat graph is that there are usually reference based methods or array based methods that can have better performance, lower memory usage, or both. You might want to look at std::make_heap and related functions.

https://en.cppreference.com/w/cpp/algorithm/make_heap

@jzmaddock I think this one can be closed out.

mkatliar commented 6 years ago

In my case the main point is the performance. I know that my graph is always a tree, and I want a) the traversals (depth-first and breadth-first) to be as quick as possible; b) avoid dynamic memory allocation whenever possible; c) fast retrieval of a parent vertex. Currently I am using boost::compressed_sparse_row_graph as an implementation. I am wondering how much performance improvement could be gained from implementing a specialized tree class. What is your opinion on that?

jeremy-murphy commented 6 years ago

I strongly suspect that you are not prevented from getting maximum performance by the current implementation, although I could be wrong. My fundamental argument would be that 'treeness' is not a property of type, but of value, in the same sense that 'sorted' is a property of value. A graph with one node is also a tree. A graph with a cycle becomes a tree when you remove an edge from the cycle (and nominate a root).

I think the key design issue is to implement algorithms that assume a graph is a tree, in the same way that we have algorithms (binary search) that assume a sequence is sorted.

One catch to this is how to deal with trees stored as an undirected graph with explicitly nominated root versus a directed graph with implicit root versus a tree that also stores 'back edges' (from child to parent).

jzmaddock commented 6 years ago

I think I agree about closing this one:

Thinking out loud here.... suppose there was a traits class that allowed you to mark a data structure as a tree (true or false), could if constexpr then be used to optimise the algorithms for that use case? Someone would need to step and implement it, and then prove it actually makes a difference though..

jeremy-murphy commented 6 years ago

I've actually had second thoughts on this and I'm wondering if there is a case for a tree class: not for trees in general, but for k-ary trees.

Correct me if I'm wrong, @jzmaddock , but Graph doesn't have concepts for a fixed number of successors, so the only way I can think of to define a k-ary tree would be to nominate a particular vertex as 'empty' (say, 0, as distinct from the graph_traits<G>::null_vertex()) and create an edge from u to 0 to represent a missing k-th successor. The extra space of one vertex is not an issue, but now the number of vertices is off by one, which makes things a bit awkward.

So maybe it's not a tree class, per se, but a graph with a fixed number of successors.

@mkatliar were you thinking of arbitrarily structured trees or k-ary trees? If the latter, how do you implement them?

anadon commented 6 years ago

I don't think k-array trees are something which sound be added to the core of BGL. A constraint like that could be added, but it could just as well be user code. What you're trying to get at is a vertex with a set number of out edges and 0 or 1 in edges. Having these predicates checked isn't that bad, but adding complexity to the BGL interface might be.

jeremy-murphy commented 6 years ago

The point is that the k is a compile-time value, not a run-time predicate, thus the reference to concepts. If you don't think k-ary trees belong in Boost.Graph, where do you think they belong?

jeremy-murphy commented 6 years ago

You may also find it edifying to read the previous discussions about adding a tree class to Boost, which occurred back in 2002 and 2009, and centered around adding one particular tree data structure. That never eventuated and it's not clear why. I suspect that trying to add an outside tree implementation into a new Boost.Tree library required more energy and analysis than was available or predicted. (Btw, the rub of all those discussions is that plenty of people want a tree data structure.) What I think would be simpler is to integrate a new data structure into BGL. It already has adjacency_matrix for whatever special use cases it covers; k_ary_tree would just be another, with accompanying tree-specific overloads of relevant algorithms.

anadon commented 6 years ago

Reading through the 2009 thread, it doesn't look like an imperative for inclusion was ever reached.

I'm trying to think here, at what point does it make sense to change from having support through graph and runtime checks and support to going through the effort to create a more specialized version of what already exists. I see thee main options: no explicit support, runtime BGL support, or a separate library in Boost. Lets break down pros and cons for each.

No Support pro:

No Support con:

Runtime BGL pro:

Separate Library pro:

Already with that, I was wrong and the balance tilts towards having something separate. But that something separate should be its own project. But so long as we're here, let's make sure the tree library as a concept would be fully fleshed out. Off hand, I can see that it would need to have a few things available. A forest container, tree data structure, standard functions, and dynamic node expansion. I'm not sold on the forest container being necessary, but having dynamic node expansion is critical for a number of domains and that behavior isn't immediately clear as to what it should even be.

I'm working on 3 hours of sleep. Am I missing anything?

jeremy-murphy commented 6 years ago

What do you mean by "dynamic node expansion"? It sounds like something from a traversal algorithm, but maybe you mean something else.

anadon commented 6 years ago

I do mean a traversal algorithm as a special case. One of my bigger areas of study has been AI, and this is one feature often worked around in my experience.

Also, for completeness, there probably should be functions for converting to and from a graph.

jeremy-murphy commented 6 years ago

I think we have very different perspectives on this question, so for the moment instead of responding to your summary I'll just give you mine.

The premises that I take as given are these:

Now the issues start to get a bit more nuanced but I'm going to state them simply.

I'll claim that adding a k-ary tree data structure to Boost.Graph does not preclude the creation of a separate k-ary tree library, either in or out of Boost.

And I'll assert that the audience here is Boost.Graph users, not all Boost users. And so the question is not the general, "Where could/should a k-ary tree data structure go?" -- the question is, "Would Boost.Graph users benefit from a k-ary tree data structure?"

My initial thought was no, there would be no performance benefit, but I was thinking about general trees, not k-ary trees. Now, I presume there could be a performance benefit. Knowing k at compile-time, theoretically, should allow for optimization of the memory layout. When a binary tree stores a predecessor edge (to the parent), we can do cool things like depth-first search and isomorphism in constant space. And I guess I am saying that these algorithms cannot be implemented on the existing graph data structures, because they don't have the concept of a fixed k number of successors.

Apart from performance, there could be a much better interface for constructing, querying and mutating a k-ary tree, with functions that relate directly to the particulars of the structure.

How would it work in practice? Boost.Graph has a great generic design, so as I said earlier, a k_ary_tree would exist alongside adjacency_list, adjacency_matrix and compressed_sparse_row_graph. It would satisfy some graph concepts (probably not all) and satisfy some new k-ary tree concepts.

Take an incomplete typical scenario:

adjacency_list<> g;
// Construct g.
MyDFSVisitor vis; // models DFS Visitor concept.
depth_first_search(g, vis, ...);

A user whose graph is a binary tree could instead do this:

k_ary_tree<2> g;
// Construct g using k-ary tree-specific interface.
MyDFSTreeVisitor vis; // models new DFS k-ary tree visitor concept (probably).
depth_first_search(g, vis, ...);

They would not have to use compressed_sparse_row_graph anymore and all the limitations it brings. Users who do not use k-ary trees would not know anything has changed. depth_first_search would be a new overload implemented in terms of the k-ary tree concept.

To summarize, I think there has to be either an improvement in performance or interface to justify adding a k-ary tree data structure, and I think there is strong theoretical support that both of these outcomes would be achieved. The real answer will come from experimentation and a proof of concept.

Now, I don't just say all this for the fun of it. Elements of Programming by Stepanov and McJones has a small, complete set of binary tree concepts and algorithms. Anyone can go ahead and make a proof of concept from it. I'll have a go when I have time.

Cheers.

jeremy-murphy commented 6 years ago

I didn't really have time but I started anyway and I may as well share my progress now.

anadon commented 5 years ago

Update cross reference from the mailing list: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3700.html

jeremy-murphy commented 5 years ago

Happy new year. @jzmaddock have I not convinced you that this issue is actually worthwhile?

jzmaddock commented 5 years ago

Maybe, but what do you want me to do?

jeremy-murphy commented 5 years ago

Just reopen this issue.

jzmaddock commented 5 years ago

Just reopen this issue.

Ah OK, that I can do ;)

BTW with regard to adding this to the BGL - it would be quite a large addition - more than the community maintenance group normally takes on, so something like a mini-review would probably be best. Some open questions off the top of my head:

Other than that, something that followed the standardization proposal is more likely to gain traction I would think - and might shake out any issues in the proposal?

jeremy-murphy commented 5 years ago

On Sun., 6 Jan. 2019, 1:19 am jzmaddock <notifications@github.com wrote:

Just reopen this issue.

Ah OK, that I can do ;)

Thanks. :)

BTW with regard to adding this to the BGL - it would be quite a large

addition - more than the community maintenance group normally takes on, so something like a mini-review would probably be best.

Ok. Let's cross that bridge when we come to it.

Some open questions off the top of my head:

  • Can it be shown that tree data structures lead to improved performance, how about neater (self explanatory) code?

Have you looked at the PR? I commented with some benchmarks that demonstrate significant performance improvement.

I don't think so, as WG21/N3700: i) is about search trees, that are self-balancing, which is more specific than the purpose of BGL ii) is designed like a standard container, which Boost.Graph classes are not.

Other than that, something that followed the standardization proposal is

more likely to gain traction I would think - and might shake out any issues in the proposal?

I get what you mean but I think it's also important that Boost maintains its independence; after all, BGL defined a framework of concepts for graphs long ago -- the standard doesn't have one, yet. WG21/N3700 is making tree classes in the standard container design, which is not the design that BGL has. I think a lot of concepts will be similar or the same and the coexistence of two projects will benefit each other.

Becheler commented 2 years ago

Hello! Any news on this front? I was hoping to use BGL to upgrade from my own Forest and k-ary Tree 🙃

jeremy-murphy commented 2 years ago

Hey @Becheler thanks for your interest. Best we can offer you right now is the PoC branch I made three years ago.

jeremy-murphy commented 2 years ago

Also, this presentation might be useful, but I warn you that the speaker is very slow, you need to watch it at 1.5x: https://youtu.be/e-cRWFiGb8E

Becheler commented 2 years ago

Oooh nice! Thank you! Well I don't think I want to complicate my dependencies even more, so I will keep my rocky implementation , but I will watch this (not-so-slow-for-non-native-speakers) cool presentation! [EDIT] I took the time to watch the presentation, and it convinced me to throw my Tree class by the window and begin using the BGL :-)

jeremy-murphy commented 1 year ago

That's very flattering, thank you, @Becheler ! It motivates me to one day pick up the baton again and get it finished.

Becheler commented 1 year ago

Hi @jeremy-murphy ! Trees are back to haunt you! 🌱 👻 I began to play with your bidirectional binary tree code, but I am not sure to understand how I'm supposed to attach vertex/edgeproperties to it? Until now my BGL rocky way has mostly been playing with adjacency_list<....., VertexProperty, EdgeProperty>. If I understand correctly your implementation I would have to define myself a property map as a data member when I define my class class my_binary_tree : boost::bidirectional_binary_tree<> ? Sorry to bother from ground 0 of BGL 😅

jeremy-murphy commented 1 year ago

Hi @jeremy-murphy ! Trees are back to haunt you! 🌱 👻

No problem!

I didn't consider properties when I implemented the class, so yeah, it's completely missing.

I wonder if you could create a new class that simply publicly inherits from both the tree and a property map to get what you need? That would give you the most direct union of functionality and not require redefining the interface.

But use composition if you need more control.

Best of luck!

Becheler commented 1 year ago

Hi @jeremy-murphy ! I'm carving my way in, but the code fails to compile with a no template named 'BidirectionalBinaryTreeConcept' in namespace 'boost::concepts'; I can't find anything in the project that defines this concept, am I missing an external dependency? Also I am not sure this present thread issue is the best suited to this discussion, so feel free to re-orient me somewhere else 😄

Becheler commented 1 year ago

Ok, found it in the file https://github.com/jeremy-murphy/graph/blob/k-ary-tree/include/boost/graph/graph_concepts.hpp . I isolated the required definitions in a concepts.hpp file :

#ifndef BOOST_GRAPH_DETAIL_CONCEPTS_TREE
#define BOOST_GRAPH_DETAIL_CONCEPTS_TREE

#include <boost/concept_check.hpp>
#include <boost/concept/assert.hpp>
#include <boost/concept/detail/concept_def.hpp>

namespace boost::concepts
{

BOOST_concept(ForwardBinaryTree,(G))
: Graph<G>
{
    BOOST_CONCEPT_USAGE(ForwardBinaryTree) {
        t = has_left_successor(u, g);
        t = has_right_successor(u, g);
        v = left_successor(u, g);
        v = right_successor(u, g);
        t = empty(u, g);
        const_constraints(g);
    }
    void const_constraints(G const &g) {
        t = has_left_successor(u, g);
        t = has_right_successor(u, g);
        v = left_successor(u, g);
        v = right_successor(u, g);
        t = empty(u, g);
    }
    bool t;
    G g;
    typename graph_traits<G>::vertex_descriptor u, v;
};

BOOST_concept(BidirectionalBinaryTree,(G))
: ForwardBinaryTree<G>
{
    BOOST_CONCEPT_USAGE(BidirectionalBinaryTree) {
        t = has_predecessor(u, g);
        t = predecessor(u, g);
        u = root(u, g);
        const_constraints(g);
    }

    void const_constraints(G const &g) {
        t = has_predecessor(u, g);
        t = predecessor(u, g);
        u = root(u, g);
    }

    bool t;
    G g;
    typename graph_traits<G>::vertex_descriptor u;
};

BOOST_concept(MutableForwardBinaryTree,(G))
    : ForwardBinaryTree<G>
{
    BOOST_CONCEPT_USAGE(MutableForwardBinaryTree) {
    e = add_left_edge(u, v, g);
    e = add_right_edge(u, v, g);
    // TODO: remove_left_edge, remove_right_edge
    }
    G g;
    typename graph_traits<G>::vertex_descriptor u, v;
    typename graph_traits<G>::edge_descriptor e;
};

BOOST_concept(MutableBidirectionalBinaryTree,(G))
    : MutableForwardBinaryTree<G>
{
    BOOST_CONCEPT_USAGE(MutableBidirectionalBinaryTree) {
    e = add_predecessor(u, v, g);
    remove_predecessor(u, g);
    }
    G g;
    typename graph_traits<G>::vertex_descriptor u, v;
    typename graph_traits<G>::edge_descriptor e;
};

}

#endif
jeremy-murphy commented 1 year ago

Hi @Becheler , sorry yeah, I forgot that there are actually a few files involved. If you look at the corresponding PR, you can see exactly what changed, but essentially there is the 'public' k-ary_tree.hpp file, then a 'private' detail/k-ary_tree.hpp file and the tree concepts that you found added to the existing graph_concepts.hpp file.

Becheler commented 1 year ago

@jeremy-murphy in the present state, the syntax k_ary_tree<3> g mentioned here is not supported as the focus seemed to be binary trees only. Am I missing some code updates and do you think it's reasonably feasible to the newcomer to expand on your binary_tree implementation? Beside having a array<vertex_descriptor, N> successors; that is. I'm no graph theoretician 😛

jeremy-murphy commented 1 year ago

Hey @Becheler , yeah that doesn't surprise me. I started writing it with k-ary in mind but at some point I probably mentally switched over to binary and forgot to look back.

I'm planning on putting some effort into my PR again and any contributions would be welcome! However, this is the general issue about adding a tree class to BGL rather than my specific implementation of it, so I think this topic is best moved over to the PR.