Closed warpfork closed 1 year ago
Hey @warpfork :)
Yeah, this is something I need as well but unfortunatelly it's not just one map that's causing the issue. - Ignoring the underlying structure the edges/vertices are stored, the AdjacencyMap()
and PredecessorMap()
methods that are used by the various graph methods both return map[K]map[K]Edge[K]
.
Maybe something like the following could work? I think it should be enough to satisfy the needs of the various callers. Only reason this is an interface is to allow transparently refactoring the implementation in the future.
// placeholder names for struct/methods, suggestions welcome
type Edges interface {
Range(func(K, []K) bool)
GetPredecessors(K) []K
}
@dominikbraun I wonder if this could be tackled at the same time as #62?
@dominikbraun I wonder if this could be tackled at the same time as #62?
@geoah Sure, there's no need to refactor the edge storage twice.
I was taking a quick look at this and since comparable
only allows for equality and not ordering, we can only go as far as making the sorting stable given the order the edges were added is the same.
Contraints like like constraints.Ordered would significantly limit what we could use for hashes, same with specifying an interface for hashes that could enable ordering.
Is there an obvious way of sorting comparables or something similar that I might be missing?
Ah, I just came across this. I too am using TopologicalSort()
, and couldn't figure out why I was getting slightly different ordering each time, until I dug into it.
In my case, I would perfectly happy to do something like, "nodes at the same level, sort lexicographically by key", anything as long as it is consistent.
Has anyone figured a workaround to it?
@deitch I'm not sure if there is a workaround with the existing functions other than implementing it yourself, but this issue is gaining more and more significance and hence will be tackled in one of the very next releases.
other than implementing it yourself
That seems like a bit of a waste.
I have been thinking a bit about the UI. Are we thinking that we would have some additional property to each vertex, e.g. SortWeight
, such that it would be considered between various vertices, all else being equal? Or maybe SortFunc
, the same way as sortSlice()
takes a less
func?
Actually, I rather like that last option. It keeps you from having to predetermine what algorithms work for whom, and just let people set their own sort priority. If you want consistently stable, you always could have a default func that can get overridden.
How can I help?
@deitch I'd also prefer a less
function for direct comparisons. That would be one way. The other way would be to keep track of the order in which the edges have been added to the graph as proposed by @warpfork. Would this approach work for you, too?
keep track of the order in which the edges have been added to the graph
I am concerned that it can be arbitrary. For example, let's say I have a graph with 5 nodes with no dependencies, and 5 direct dependencies, one of each. Something like:
A -> 1
B -> 2
C -> 3
D -> 4
E -> 5
I would think a normal expectation would be that it wouldn't matter what order I added them, the graph is the same, and so the sort order should not depend if I added A->1 and later B->2 or the reverse.
However, if time does matter for some people, we could track when the nodes where added, and then they can use the less
style function to track it to meet their needs.
Yes, I think you're right. A less
-function-style or lexicographical-style ordering of vertices is a saner default than a time-based ordering, at least when it comes to reproducibility.
When you are given a mere adjacency map or a representation like in your example, you can't make any assumptions about the order in which the edges have been added but only about the ordering of vertices that makes sense for your case.
We could provide a new function like StableTopologicalSort
(or something similar, to maintain backwards compatibility) which takes a less
function. If the user needs a time-based ordering, they could do this themselves within the less
function.
Yup, answers all use cases.
I don't know that you need a new function. I would just use the existing one. Is there any case where someone might want to have non-stable sort intentionally?
Ah, or are you thinking, if we want to be able to sort according to multiple algorithms, and so we should pass the less
function, rather than make it a property of the vertex? I can see that. One time I might call sort with one function, another time with another, so it is important to have an argument, and hence we need an argument and therefore changing the func or adding a new one? Got it.
Ah, or are you thinking, if we want to be able to sort according to multiple algorithms, and so we should pass the
less
function, rather than make it a property of the vertex?
Exactly. The less
function doesn't need to be a vertex property because it's going to be the same for all vertices, and it could be different for functions other than TopologicalSort
, so accepting it as an argument would make sense.
If I can help, let me know.
If you want, you can either help by implementing the new function, writing tests for the new function, or verify/test the new function once it is implemented. In any case, feedback is appreciated :-) I'm going to start with the implementation this weekend if nobody else wants to do it.
I wish I could tell you I would get to it this weekend, but I likely won't have time for another week. But post here as you go through it, and I will jump in as I can. I probably can be more helpful with test cases anyways, as that does not require knowing the internal guts of the graph implementation.
@warpfork @tonglil @vito @deitch @williamfzc
This change has been released in graph v0.22.0.
And a beautiful thing it is. Enjoying it right now.
@dominikbraun I am seeing major performance degradation with the new sort, even if I do not use the stable function. A program that used to take 42s, with the sort func not even showing up in the flame graph, is now taking ~222s, with the sort the single largest factor. I will open a new issue and try to replicate it in a sample.
Sometimes it is desirable to have a stable output for operations such as topological sort.
There are two ways of doing this (depending on if the goal is stability, or merely determinism):
I tend to favor the former approach, because it provides both properties at once (and also because it feels more general: edge order is a nice tiebreaker to have in a lot of scenarios, not just this one). But either works for producing deterministic outputs.
The current implementation can produce random reordering among nodes that aren't constrained. (I assume this is because a golang map is used somewhere.) Sometimes this is fine! But in some scenarios, less fune: unstable output ordering is a big bummer if you have application level predictability desires... Or even just are doing text fixtures which depend on consistency :)