Help determining stages

derekperkins commented 7 years ago

This package is nearly perfect for my use case. The only thing I'm missing is know how many stages I'll need to process data while running as parallel as possible. I'll use your readme example to illustrate:

// List of all simple strings (to be sorted)
var stringsToSort = []string{
    "A", "B", "C", "D", "E", "F", "G", "H",
}

// List of dependencies
var stringDependencies = []topo.Dependency{
    topo.Dependency{Child: "B", Parent: "A"},
    topo.Dependency{Child: "B", Parent: "C"},
    topo.Dependency{Child: "B", Parent: "D"},
    topo.Dependency{Child: "A", Parent: "E"},
    topo.Dependency{Child: "D", Parent: "C"},
}

Sorted list of strings: [E A C D B F G H]

If each of those letters represented a function, I'd have to wait for each stage of dependencies to complete before running the next.

Sorted list of strings in parallel stages:
[
  [F G H E C],
  [A D],
  [B],
]

In that list, F, G, H, E and C aren't dependent on anything, so they can run first and all in parallel. Next, A and D need E and C to finish respectively, but have no cross-dependencies, so they can run in parallel. That just leaves B, which has to run in its own stage after A, C and D complete.

How difficult do you think it would be to add that functionality?

Thanks!

fako1024 commented 7 years ago

Thanks a lot for the feedback!

An interesting proposal - so just to clarify: If you were to think of the sorted list of X (X being whatever) as a sort of tree (where each layer were to represent a set of X without inter-dependencies), you want to be able to retrieve all sub-slices of these layers, correct?

If so, let me give it some thought, I think it should be possible to extract that information during the recursion, I just have to consider a) how to elegantly store this extra information and b) how to make it accessible.

derekperkins commented 7 years ago

That's it exactly. I'm not sure that it makes sense to do an inline sort the way the current code works today. I'd look for it to return [][]Type.

fako1024 commented 7 years ago

Thanks for confirming. The existing interface will definitively remain (an inline sort was the original intention, and the API will not change). What I'd rather envision would be having an extension: A second function like

DependencyLayers(data interface{}, deps []Dependency) ([][]Object, error)

or similar, which would provide the required information (Although there may be some additional requirements / steps, since you'd still need to convert [][]Object to your specific type (in your case a function if I understood correctly), simply because [][]interface{} != [][]anything (which is also why the Sort() function needs the specific getter/setter functions).

Would that generally serve your purpose?

derekperkins commented 7 years ago

Right, I wasn't proposing changing the existing sort functionality.

I've got a couple different ways that I could use the function. I could either pass in my original objects like your original Sort, or I also wondering about having String/Int versions, where no interfaces are used and I'm just sending in a unique ID. Then after getting back [][]string/[][]int, I would map those back into my original data. That might be a little more complex than is worth, given that it doesn't necessarily make the api easier.

At the end of the day, if I was just passing in *MyThing and got [][]*MyThing back (after type asserting in a loop like you mention), that would be perfect.

derekperkins commented 7 years ago

Another point I forgot to mention is that I'd expect an error if there were cyclic dependencies.

fako1024 commented 7 years ago

Thanks for the clarification. I'd like to keep the functionality as generic as possible, so I'll try to come up with a viable solution for any type (in particular since I've already found a simple and clean way of doing the layering in the original graph processing code, so there would be no need for a second line of processing and I could maximize code reusage (That also ensures your requirement about errors on cyclic dependencies). I'll look into it asap (might be a day or two)...

derekperkins commented 7 years ago

Thanks a ton! This going to be really helpful. The end use case is similar to Apache Spark's job scheduler (DAG - directed acyclic graph).

Spark processing task flow from https://www.sigmoid.com/apache-spark-internals/

derekperkins commented 7 years ago

@fako1024 Any luck with this?

fako1024 commented 7 years ago

In general yes, but unfortunately the backtracking of the graph layers is more complicated than I originally anticipated - I hope you can give me a few more days without stalling your project - since I've already invested some time I'd hate that to go to waste...

derekperkins commented 7 years ago

No worries, I'm not trying to pressure you. I'm writing my code to work in stages like I mentioned above, so I'll be able to plug in your code to create more intelligent stages whenever it's ready. I really appreciate your time!

derekperkins commented 7 years ago

Just checking to see if I can help review code or something.

fako1024 / topo

Help determining stages #1