gonum / optimize

Packages for solving minimization problems [DEPRECATED]
66 stars 9 forks source link

Add concurrent global optimization #146

Closed btracey closed 8 years ago

btracey commented 9 years ago

@vladimir-ch @kortschak Here is a 30% version of global to get the conversation started. I was hoping to have random restart be the example method, but it's actually pretty tricky to write with the current organization of the package.

There are tons of naming and commenting issues which can be fixed later. Similarly, there is a lot of overlap with Global and Local that can be combined.

In my mind, the question at the moment is discovering the right API for methods. I think the current scheme is reasonably clean, but it feels like there may be a better approach.

In Local, Method acts like a producer. It's sort of like it's a channel that gives newer and newer locations to evaluate. In theory, it seems like Global should act similarly, just with several producers. The problem is that if you actually write it as a producer (with a for{}), there needs to be some way to shut them down. Thus, we can't just abandon the method (like we do with Local), we need to have a quit channel that communicates back. This seems unnecessarily tricky (even guess and check is easy to mess up), but I couldn't think of a better way today. It's not terrible as is, I'd just like to have some extra eyes before deciding this way is the cleanest.

vladimir-ch commented 9 years ago

Not being so much familiar with them, I wonder how much is parallelism inherent to global optimization methods. GuessAndCheck or random restart clearly do benefit from parallel execution, what about other methods?

Independently of that, could the concurrency and channels exposed by Schedule be moved one level up and hidden from implementors of GlobalMethod? I am no expert on Go's concurrency but it seems the WaitGroup stuff and channel closing etc will (have to?) be common to all GlobalMethods. This may be complete nonsense but would passing just a task id and a location be enough? Something like

type GlobalMethod interface {
    IterateGlobal(task int, loc *Location) (Operation, error)
}

Then GuessAndCheck would become something like

type GuessAndCheck struct {
    Rander Rander

    lastOp map[int]Operation
}

func (g GuessAndCheck) IterateGlobal(task int, loc *Location) (Operation, error) {
    op, ok := g.lastOp[task]
    if !ok {
        // lock
        g.lastOp[task] = NoOperation
        // unlock
    }
    switch op {
    case NoOperation, MajorIteration:
        g.Rander.Rand(loc.X)
        g.lastOp[task] = FuncEvaluation
        return FuncEvaluation, nil
    default:
        g.lastOp[task] = MajorIteration
        return MajorIteration, nil
    }
}

Locations, channels, etc would be handled in minimizeGlobal().

vladimir-ch commented 9 years ago

With the above approach RandomRestart could also just dispatch location to a local method based on task id (after creating it if it did not exist). Only that at the moment I don't see a way how to signal a restart with neither approach.

btracey commented 9 years ago

Method can then also gain a Quit (or cleanup or whatever) in case Method wants to launch persistant goroutines. I like that better, I'll explore how it looks.

Random Restart works by starting concurrent instances of Local. It doesn't have access to Problem, and thus it cannot call Local directly, instead running its own instance of Local. When the local optimization converges, that's a major iteration. When a specific optimization converges that particular goroutine selects a new random starting point, and restarts the local optimization. This continues until Quit is called.

btracey commented 9 years ago

Note to self: Quit should probably return either _Location or []_Location to allow updating with the final function evaluations that have happened.

btracey commented 9 years ago

Yes, I think concurrency is beneficial to many global optimization methods. Genetic Algorithms generally have a population that can be evaluated concurrently. Random Restart can be run several times in parallel. Simulated Annealing can be run at multiple temperatures simultaneously. Obviously guess and check can happen in parallel.

vladimir-ch commented 9 years ago

... When a specific optimization converges that particular goroutine selects a new random starting point, and restarts the local optimization. ...

The problem with this is that RandomRestart does not know when a specific task (goroutine) has converged (because convergence is checked one level higher) so it cannot call the corresponding method's Init with a new starting point. A way around this could be: RandomRestart keeps a slice of Location for each task and updates it when a task returns MajorIteration. When IterateGlobal is called again and the location is the same, there was no convergence and call localMethods[task].Iterate. If loc.X is different, call localMethods[task].Init. But this would probably not work either, because it is RandomRestart what would generate the new starting point, not minimizeGlobal.

Yes, I think concurrency is beneficial to many global optimization methods. Genetic Algorithms generally have a population that can be evaluated concurrently. Random Restart can be run several times in parallel. Simulated Annealing can be run at multiple temperatures simultaneously. Obviously guess and check can happen in parallel.

That comment of mine was motivated by the name and signature of Schedule: it is so explicitly concurrent that I was wondering if global == concurrent and if a ConcurrentMethod wouldn't be a better name. But if you like my proposal and if it works out well, my GlobalMethod does not require concurrency to be involved if it is not requested. If settings.Concurrent is false, no channels and goroutines have to be created at all, just a loop that calls method.IterateGlobal(0, loc), probably from minimizeGlobalSerial. If Concurrent is true, then setup all the concurrency stuff and call method.IterateGlobal(task, loc[task]), probably from minimizeGlobalParallel. Or something like that.

I like where this is heading.

btracey commented 8 years ago

There is a need to have an init that specifies the number of concurrent tasks. The number of parallel threads is not necessarily the same number as could be parallel. In a genetic algorithm, for instance, the entire population could be executed in parallel.

I agree it's better that the method doesn't have to deal with close, etc. explicitly. I was hoping to avoid a directly locking algorithm, which is why it passes a channel to return the results on. I'll keep pondering, somewhere in between us is the right solution. GuessAndCheck is really easy. I think you can, but I need to ponder a bit more if you can do all of the others without launching goroutines.

btracey commented 8 years ago

Working on a transformation, but it's more work than I originally thought. Will update when I have a chance to get it all.

vladimir-ch commented 8 years ago

I guess this can be closed now. @btracey ?