Sparsity metric for performance dataset

bentsherman / tesseract

A tool for creating resource prediction models for scientific workflows

MIT License

10 stars 2 forks source link

Since training data must be acquired by running the target application many times, it will be important to minimize the number of training samples required to achieve good accuracy. I think a good way to measure this is the number of samples or the "sparsity" of the training set, which is the number of samples normalized by the size of the search space. For some applications this metric will be harder to define, because it depends on what you consider to be "sensible values" for each command-line parameter. It may be best to use a log-scale for this metric, since the search space grows factorially with the number of parameters and range of each paramter.

bentsherman / tesseract

Sparsity metric for performance dataset #11