Ecotrust / bioregion-discovery

Ecotrust's Bioregion Discovery Portal
http://bioregions.apps.ecotrust.org
4 stars 1 forks source link

Optimize convergence algorithm #28

Closed perrygeo closed 12 years ago

perrygeo commented 12 years ago

In order to converge on a cost threshold that will yield the desired bioregion size, we have to iterate. The number of iterations determines the speed of the analysis.

We can a) figure out a more accurate initial seed estimate for cost threshold - more accurate initial estimate means faster convergence and b) add some tricks to converge on the answer faster - right now uses a btree-type approach but there may be other ways to approach it.

Google Code Info: Issue #: 28 Author: perrygeo...@gmail.com Created On: 2011-07-14T19:49:23.000Z Closed On: 2011-07-20T19:54:35.000Z

perrygeo commented 12 years ago

guessing this is a two day job...? perhaps it's been done already now..?

Google Code Info: Author: sfletche@gmail.com Created On: 2011-07-19T22:41:20.000Z

perrygeo commented 12 years ago

There really is no precise mathmatical relationship between the cost threshold and the bioregion size - there can't be due to spatial variation. But, by running hundreds of randomly generated bioregions and doing a simple linear regression between the cost threshold and output size, we can get a pretty decent initial guess. This regression formula has been committed and definitely reduces the number of iterations required (some times it even hits the size target on the first iteration!)

In terms of convergence algorithms, the data is not linear so its hard to predict. The fastest approach seems to be to estimate a value between the lowest overestimate and the highest underestimate (weighted by how close the over/userestimate was to the desired size). In order for this to converge quickly, we need to have over and under estimates pretty quickly so I set it up to seed the over and under lists with inflated values to quickly get a bounds. In some cases this means an extra iteration but in most scenarios this means we converge on the answer far more quickly (eg 3 or 4 steps instead of 10)

Google Code Info: Author: perrygeo...@gmail.com Created On: 2011-07-20T19:54:35.000Z