Computation time optimization

wese-da commented 8 years ago

Because some processes are just too slow (e.g. reading data from databases)

wese-da commented 8 years ago

The following changes took place in branch 'computationTime': The number of threads is now a configuration parameter. If no number is given, the old (single-threaded) version is executed. At the moment, only some database post-processing can be done multi-threaded (for landuse data). There still is potential for parallelization, e.g. in demand generation, have to implement and test it in the next days. This could also be a possibility to create "generic" multi-threading modules.

wese-da commented 8 years ago

Now, there is one module that handles multiple threads. The threads extend a superclass containing the basic functionalities. So far, it should work for handling database-related stuff.

Some thoughts about why I discarded parallelizing things like demand generation for now: Unlike the database read and process threads, demand generation is partially based on probabilities several times (e.g. picking a household from survey data, drawing a random coordinate for an activity). These probabilities are taken from a common "pool" of random numbers. Due to a random seed, the probabilities occur in a deterministic order (pseudo-randomness) which is nice if one wants to repetitively create the same scenario (like we do). This order is not necessarily clear (= deterministic) if we run multiple threads.

Single-threaded Random1 -> Household1 Random2 -> Household1, home location Random3 -> Person1, work location ...

Multi-Threaded: Run 1 Random1 -> Household1 (Thread 1) Random2 -> Household2 (Thread 2) Random3 -> Household1, home location (Thread 1) ...

Run 2 Random1 -> Household1 (Thread 1) Random2 -> Household2 (Thread 2) Random3 -> Household2, home location (Thread 2) ...

wese-da commented 8 years ago

Another major factor for computation time is the fetch size of the jdbc statement (link).

A test run for retrieving Berlin's landuse and building geometries resulted in this: default fetch size (10): ca 8 min fetch size = 1000: ca 0.5 min

I don't know, if this already is the "optimal" fetch size, but for the moment, that's quite an improvement.

wese-da commented 8 years ago

Network generation is now able to use multi-threading too. First test results for an ~ 20k links network: 1 Thread: 4,5 min 2 Threads: 2,5 min

wese-da commented 8 years ago

A lot of stuff now works much faster than before. Closing this issue for now...

InnoZ / MAS

Computation time optimization #43