Define the processing information necessary throughout a typical optimization process

PastelBelem8 commented 6 years ago

When performing optimization, we collect different measurements. A careful review over the involved metrics and the necessary information should be considered.

This should be documented properly.

For example, important information to collect is

the time it took an evaluation, thus allowing to monitor problems in the simulator/model or even to provide feedback that the process is running.

the set of parameters' values that generated a specific design. ...

PastelBelem8 commented 6 years ago

When structuring an optimization workflow there are some indicators that should be retrieved to guarantee that the optimization process is evolving appropriately. The collection of such indicators is even more relevant when considering an interactive and dynamic optimization environment. This chapter focus on the analysis of the necessary data that should be collected throughout the optimization.

When solving optimization problems we collect both qualitative and quantitative metrics. It only makes sense to evaluate the performance of the optimization algorithms when we are benchmarking several algorithms, for which we need a term of comparison. Therefore, these quality indicators should only be computed after we have the results for the algorithms and not during the optimization. Moreover, since some of these indicators require a reference set, it would be ideal if we could provide them the true Pareto Front, however, this is usually hard to accomplish and, therefore, one idea is to combine the results of all the benchmarked results and try to get a better approximation of the true Pareto Front. This is particularly useful in the context of Architectural Problems, where it is usually impossible to know the shape of the true Pareto Front.

Factor	Pros	Cons
Evaluation time	Allows to estimate the estimated time per analyses and informs the user about the time necessary to complete an analysis.	Necessary to maintain an historic of the last evaluations (implies more memory allocation and more time (time complexity is insignificant when compared to analysis, in general)).
	If an analysis takes too long and gradually becomes worse, it allows to identify bugs or in the limit to learn something about the complexity of the model (e.g. daylight analysis take longer in more sparse environments with more holes)	How many records should it keep? Where should this information appear?
		Define which evaluations it considers, should it consider the model-based evaluations as well?
Evaluation Number	In terms of the algorithms it is usually the best metric to account for.	Usually not linear in time and is not an appropriate measure
	Allows to have a pretty good sense of the convergence of different algorithms	Define which evaluations it considers, should it consider the model-based evaluations as well?
	User knows how many evaluations are left for the algorithm to run	Usually only relevant in the context of tests/benchmarks
Best Solution(s)	Allows the user to have a real feedback of the best solution found so far	How many solutions to present?
	Providing a group of the best solutions would be better (as long as they are sufficiently different)	Which clustering algorithm to use? (Only applicable in the case of the population, unless we keep track of all the solutions found so far, which implies more memory and more computation time)
	Computation time becomes irrelevant in the context of architectural design	Should the user be able to conduct the evolution of the optimization process by choosing his own variations?
	Very good feedback in the case where the user is able to visualize the variations by interactively selecting the best solutions
Decision Variable Values	Important to allow the replication of the results (in the case of verification is needed)	Occupies space but it can be stored in disk (assuming there's enough space), which implies read/write overheads when accessing these values.

Constraint violations / Feasibility	Important to verify if there are any solutions that violate any constraints and, possibly by how much they violate them. The users might learn rules or detect errors by visualizing the solutions that violate constraints.	Occupies space but it can be stored in disk (assuming there's enough space), which implies read/write overheads when accessing these values.
Objective Values	Obviously it is important to have the constant mapping between the decision variables' values and the associated objective values	Occupies space but it can be stored in disk (assuming there's enough space), which implies read/write overheads when accessing these values.
Average Quality of the Population	Depending on the algorithm it is possible to use metrics to measure the performance	Which metric to use? Should we maintain a historic (apart from the archive) and maintain an approximation that is incrementally closer to the true Pareto Front?
	Allows a finer feedback over the performance of the optimization process run	If we plan to integrate with other libraries it might be hard to accomplish
		In highly-random-based problems it usually does not provide a great feedback in earlier stages
Average Quality of the Surrogate Model	TODO - READ Chapter 3 - Surrogate Models about "SMF and Improvement method" - there are two ways to choose the candidate solutions (the ones that maximize the improvement or the ones which are the best in the current model)	Decide which metric to use and how to measure it

PastelBelem8 commented 5 years ago

Other important to have information is:

[ ] Algorithm Name
[ ] Algorithm Params

PastelBelem8 / ADOPT.jl

Define the processing information necessary throughout a typical optimization process #5