depth_first_search: Return visitor by value

rkohrt commented 2 years ago

In the documentation the DFSVisitor is taken by reference: https://www.boost.org/doc/libs/1_80_0/libs/graph/doc/depth_first_visit.html , while in the actual header it is taken by copy. I assume the code in the header is by mistake and I would actually like it to take the visitor by reference for my usecase.

As far as I can tell this mismatch existed already from the beginning: https://github.com/boostorg/graph/commit/121bb31837075fe98dfc44b00edc6c382ec394b8

jeremy-murphy commented 2 years ago

Yes, the algorithm has DFVisitor& in the declaration, but if you search for all the references to "visitor" on that page, you'll notice that it is otherwise defined as being taken by value.

If you recall, visitors to STL algorithms (such as std::for_each) are also taken by value, so that is the 'correct' thing to do here. Unfortunately, this algorithm doesn't implement the complete visitor design, which should include returning the visitor by value at the end.

So the real fix is to change these algorithms from returning void to returning DFSVisitor by value.

jeremy-murphy commented 2 years ago

Thanks for the quick response. For a moment there I thought that detail::depth_first_visit_impl needed to be changed in the same way, but since it already takes the visitor by reference, well, let's leave it that way.

jeremy-murphy commented 2 years ago

We can remove that [1] note now about a mutable visitor needing to hold data by pointer/reference.

jeremy-murphy commented 2 years ago

Great, all tests pass, nothing broken!

jeremy-murphy commented 2 years ago

Oh, now I just realized that this is only the depth_first_VISIT page, so we also have to modify the documentation page for depth_first_SEARCH.

There is also the BREADTH_first algorithm to consider. It would be odd to change one and not the other.

Also... we do need a test to show that the new interface works. Nothing complicated, just something that shows that a mutating visitor returns in the expected state. One that counts the number of times each different visit function is called would be good. When it is returned, you assert the counts equal what you expect.

I know this is more work than you expected, but it is a really positive improvement for the library. I'll advise you through it and merge it at the end.

rkohrt commented 2 years ago

Okay, will make these changes next week

rkohrt commented 2 years ago

Will need to have a closer look at the breath first search to make the change there, because the interface is different. But here is the test for the dfv

jeremy-murphy commented 2 years ago

It's looking really good!

badfilms commented 2 years ago

I believe that the issue with passing a visitor by value has to do with when the DFS implementation is recursive, which is why the source for that implementation explicitly has the following comment:

https://github.com/boostorg/graph/blob/2862644154f35bf1db130b984a22fa4ea3dff955/include/boost/graph/depth_first_search.hpp#L224

From my experience, this is due to the fact that visitors are inherited from (or otherwise defined in a manner that meets the visitor concept), and thus we cannot expect a user-defined visitor to not have state (which will not be maintained in the current recursive implementation unless passed by reference).

Obviously, the recursive implementation shouldn't be preferred to the iterative one in the majority of use cases, and I believe is more of a legacy implementation from many years ago, but it is worth noting.

But the inconsistencies make the design decision less transparent. Because of the point mentioned above regarding state, I was always under the impression that it was due to the fact that passing a visitor by value has the potential to be expensive--unlike the higher order functions passed by value to algorithms like std::for_each, which just take an invokable that should be trivially copyable.

jeremy-murphy commented 2 years ago

Hey @badfilms , thanks for joining in and pointing out those facts that yes, make the original design decision less transparent and the best way forward now less obvious.

I'm still happy to proceed with the design that I have suggested, copying the visitor, until a definite red flag appears.

But the inconsistencies make the design decision less transparent. Because of the point mentioned above regarding state, I was always under the impression that it was due to the fact that passing a visitor by value has the potential to be expensive--unlike the higher order functions passed by value to algorithms like std::for_each, which just take an invokable that should be trivially copyable.

It's worth noting that std::for_each actually returns the visitor passed into it, which is only done because the visitor might have state that you want to inspect after the algorithm completes. In fact, one might even argue that the only reason to use std::for_each is to apply a stateful 'visitor' and inspect it at the end. (Any transformation (i.e. mutation) of the elements can be accomplished using std::transform.)

Having state does not mean copying is expensive, right? It's only if the copy/assignment constructor is non-trivial that it becomes an issue. Now if a visitor wanted to have a huge amount of state, then I would say it should just hold a pointer to it -- which itself is trivially copyable. And this argument goes for both 'visitors' in std::for_each and graph algorithms.

But on the other hand, I can't really argue against taking visitors by reference either. ¯\_(ツ)_/¯

badfilms commented 2 years ago

It's worth noting that std::for_each actually returns the visitor passed into it, which is only done because the visitor might have state that you want to inspect after the algorithm completes. In fact, one might even argue that the only reason to use std::for_each is to apply a stateful 'visitor' and inspect it at the end. (Any transformation (i.e. mutation) of the elements can be accomplished using std::transform.)

@jeremy-murphy Ahh right. I clearly wasn't thinking when I wrote that because my brain was differentiating a DFSVisitor, for example, from a visitor to a std algorithm due to the fact that it has multiple points of interest, forgetting the case of a function object.

jeremy-murphy commented 1 year ago

I'm still keen to see this finished, btw. Are you still interested in working on it, @rkohrt ?

boostorg / graph

depth_first_search: Return visitor by value #307