EwoutH / urban-self-driving-effects

Investigating the effects of self-driving cars on cities
GNU General Public License v3.0
0 stars 0 forks source link

Model performance #3

Open EwoutH opened 3 weeks ago

EwoutH commented 3 weeks ago

Just to have this documented some where.

Currently we're testing simulation a city of 600.000 inhabitants from 6 to 23 o'clock that all take on average ~3.73 trips in that time period. Of those ~2.24 million trips about half is done by car, obviously heavily depending on simulation input parameters. The trips by car are taken on a road network with 1545 nodes and 3272 edges. Note that road networks are always relatively sparse to many other networks, with a relatively low edge:node ratio of "only" ~2 in this case.

As for model run time:

Locally on an Alder Lake laptop this is about ~2x faster than on a GitHub CI runner.

All optimizations so far are pure python, algorithmic and data structure. Much of the runtime now is UXsim itself, which we already sped up as far as currently feasible (by over 40x):

Options like cython, numba and even the PEP 744 Python 3.13 JIT compiler all remain available for potential further speedups. But since most now is NumPy anyways I think this is about it. Since we will be running, parallelization isn't really beneficial.

@quaquel how many CPU hours on DelftBlue do you think is feasible? 24 hours on a single 48-core node is already over 1000, so that would mean 6000 model runs per node-day.

So far a little bit of documentation on runtime performance. I will go back to reviewing the research questions now and seeing what still needs to done to properly answer them model-wise (aside from obviously the data collection). Based on that we can discuss experimental setup Tuesday.

quaquel commented 3 weeks ago

@quaquel how many CPU hours on DelftBlue do you think is feasible? 24 hours on a single 48-core node is already over 1000, so that would mean 6000 model runs per node-day.

Are you using a research account or student account? This will limit the number of hours you can run. Moreover, there are a few key questions that need to be answered

  1. How stochastic is the model?
  2. What are the key uncertainties / policy levers you want to explore?

In other applications, we have been using multiple jobs in parallel so we can run across multiple nodes.

EwoutH commented 3 weeks ago

Thanks for getting back, good questions!

Not running on DelftBlue yet, but still have my research account.

How stochastic is the model?

Currently, is quite stochastic on the micro/agent level, but since so many trips are taken, the high level output values are quite stable (law of large numbers).

A fully deterministic mode is available in UXsim, and I think in Mesa that should also be largely possible.

What are the key uncertainties / policy levers you want to explore?

That’s what I’m going to take a critical look back on in the next few days, also to see what’s feasible implementation wise.

How many free variables can I approximately have when running ~10k simulations? That also depends on the width of the ranges, right?

quaquel commented 3 weeks ago

It depends on the volume of the space (so both number of uncertainties/levers, and their respective ranges), but also on the response surface. Best in my experience is to start with several small Monte Carlo runs. They can later be merged, but they also allow you to do some statistical tests to see how different their results are.

EwoutH commented 1 week ago

Data aggregation can also be slow apparently