VisionEval / VisionEval-Dev

Development version of VisionEval framework
https://visioneval.github.io/
Apache License 2.0
6 stars 32 forks source link

Long non-linear run times on large models #174

Open jrawbits opened 2 years ago

jrawbits commented 2 years ago

Based on timing reported in log files from running multiple scenarios in the H-GAC models, a few of the modules appear to have undesirable algorithmic properties (in particular, with runtimes suggesting order N squared or worse). It is desirable to delve into how these modules do their work and see if there is some way to reduce the complexity - all of the "victims" are doing some kind of sampling and balancing toward target proportions, and we'll easily hit order N-squared if the process requires doing N samples N times. That may be inevitable, but perhaps there's a craftier sampling algorithm out there... I'll edit this issue with the names of the affected modules once I've downloaded the enormous set of results...

The modules I'm starting with are PredictWorkers, and CalculateVehicleOwnCost - the latter runtime collapses whenever pay-as-you-drive is set up in the scenario.

jrawbits commented 2 years ago

The problem appears to be using the sample function in many places in R to generate a set of values that are randomly distributed according to a probability / proportion. The sample function without replacement (essentially classifying the stuff we're sampling from) is excruciatingly slow and scales very badly to big samples from big populations.

There is a much faster algorithm available and even though it appears to scale to cases with probabilities, it is really only valid as a "non-probability" sampler. To sample with probabilities and get something like the same answer requires iterating once over (on average) half the population for each required worker (removing any selected worker, then rescaling the remaining probabilities). I'll keep researching for a while, but for now I'm just going to pursue them module fixups mentioned in the next comment.

jrawbits commented 2 years ago

Since I have the modules open, I'm making some other cleanups to the estimation and documentation builds (and getting rid of some R CMD check warnings along the way), and I'll put that up as a pull request soon.