Open jamesbevins opened 6 years ago
On reflection, I think b is best. We should also track run time to increase/descrease the nodes assigned to a particular walker based on run time. Solutions are occassionally generated which run much longer than the expected avg run time.
@ dholland4
Let's develop method to have user-defined statistical uncertainty levels; the code will determine the # of particles on the fly to meet that threshold.
My thought is to have two user defined parameters: initial # of particles (N) and number of levels (L). Assume the initial fidelity is level 1 (L1)
Complication: There are now designs at various levels of fidelity. Do we include new populations members at lower or the highest fidelity level? What is passed to Gnowee?
Personal opinion: I like the idea of continuing the optimization using the highest level population ("localish" search) until that population shrinks. Then (with increasing fidelity) , run new lower fidelity designs ("globalish" search) with the the lower fidelity versions of the highest fidelity population until you find designs that can supplement (with replacement) your high fidelity population to its nominal size. Continue to the next higher fidelity level if it exists (or stay at the highest level) until convergence is reached.
We currently run designs at different levels of fidelity too. We scale the CPUs to (roughly) maintain load balancing.
I think the way is to forward is to specify the initial number of particles (N) and the statistical significance desired at each level. The code then automatically adjusts N from generation to generation to meet that level and scales N to achieve higher fidelity.
The trick is figuring our when to scale to a higher fidelity level. It think using a fraction of the initial best member fitness makes some sense, but what fraction? The easiest thing might be to keep at L1 for G generations, then have the top D designs at the top level, the next best designs at the next highest level, etc. There is some inefficiency here in the mid time calculations, but it is the most general I can come up with since the fitness is relative to the objective function specified and the problem being solved, making it difficult to know a priori what a good fitness is.
Good news about the different fidelity levels!
Perhaps I am mixing up fitness (objective funct values), fidelity, and statistical uncertainty (accuracy). My understanding is that it's impossible to know if increasing N will improve or worsen the fitness value(s). Increasing N can only decrease the uncertainty and increase the accuracy associated with the fitness value(s). Improving the actual fitness value is the task of the optimization approach (Gnowee).
Thus, I understood the question to be: when and how should we increase a design's fidelity (our knowledge of the true fitness value's accuracy, which depends on N)? Please correct me if I am mistaken.
There are two parts to this question:
On reflection, I think this is really an optimization issue. I suggest that Coeus (being a wrapper) should not make decisions regarding the optimization, but only handle input/output, scheduling, etc. Thus, perhaps the optimizer should choose N for the desired design simulations, and return to Coeus N or a design-specific relative resource requirement value (say between 0 and 10). Then, Coeus can take the number of desired design simulations, N or relative values, and number of resources (nodes/processors) to determine the best resource allocation. If running asynchronous, the more intensive runs could use more nodes/processors and be chosen first. If synchronous, then more intensive runs could be allocated more nodes/processors.
Correct, increasing N gets us lower statistical uncertainty and a better understanding of the true fitness, and yes, the question is determining when (in an automated fashion) to increase N for a high fitness design to make sure the high fitness is not an artifact of poor statistics.
The two parts that you outlined seem correct to me; the biggest question is part 1. That is one approach as suggested, but there are others, such as setting the levels based on some fraction of the best (or average) fitness from the first population, etc.
This is something that will be handled in the interface that Coeus provides. Gnowee doesn't traditionally look at this because it is evaluating an objective function based on a perturbed set of parameters. It assumes perfect knowledge of those parameters, and values derived from those parameters. These are quick calculations, so this leveled approach to maximize efficiency isn't needed.
However, we could calculate a fitness uncertainty too, and use that in the algorithm; it isn't clear to me though that this would be beneficial.
Gnowee Utilities module. Currently the increases happen at set fitness points, but not all problems converge to the same fitness criteria. Couple of options: a) Have the run time increase at set fractions of the best initial fitness. b) Have a routine that evaluates if increasing fitness is needed based on the statistics of the output tallies.
Note, both a and b could be used.