Speeding up the simulation

bokae commented 6 years ago

It is still quite slow for bigger (e.g. more realistic) sizes. Ideas for speeding up.

Done:

I've already changed state storing lists to either deques (where we only ever need the two ends of the list), or to sets (where we only need containment testing and adding and removing elements).
https://github.com/robtandy/randomdict There is an error in the above repo, so installation is pip install git+https://github.com/NicoDeGiacomo/randomdict, this contains the fix. On the benchmark config, it only speeded up the simulation by 1s/batch...
For loops to list comprehensions or maps where possible, making as many local variables in for loops as we can, since global variable access is slower. What is the case when using methods of the same class?
I've ran the code with python -m cProfile run.py 0608_benchmark, and it turned out that generating requests eats up almost half of the time! There should be a better way for it. The poorest algorithm is not much slower than the random one.
I'm generating random numbers in the first place into a deque, and map and filter them into grid integers. Then I access request origin and destination points from this deque, and extend the deque if there are no more points in it. This has been really efficient!
Calculating neighbors on the grid in advance, then storing them in a dictionary or an array for really quick access.
Is a huge dict the fastest way of storing all objects? Or maybe a big array, where indices are the request_ids and taxi_ids? If the number of objects change, how do we allocate the space dynamically? Or do we allocate all in advance? Maybe that would eat up too much memory... Or maybe some kind of a tree structure for O(log n) faster access?: A dict enables O(1) access, because it is hashing the keys. It is definitely fast. ** Is there a computer somewhere, where we have more than 5 threads? Correct jimgray commands do the trick!!!

 sbatch -c 6 --mem=1000 run.sh
# -c is number of threads, has to divide 24 (e.g. 2,4,6 or 12, depending on memory consumption)
# --mem is memory allocation in MB, compulsory, otherwise, all memory is allocated to one job, and there will be no multiple threads

I could not find a great data structure (e.g. KDTree, RTree) in Python for storing City.A in a way that is accessibe very fast, but insertion and/or deletion is also very fast. I've given up on this track for now, maybe if we modify the grid in a next round to some map... Geopandas has to be reindexed every time it gets modified, it is not good here.

Possibilities:

Maybe Python itself is an obstacle - is there any way to compile the code? (How much would it take to rewrite it in C++? Is it worth the time?) A pyc file is compiled from the city_model.py module, the question is whether there is a compiler that can optimize more.
Could we run it on GPU?
Estimate runtime based on parameters and find an optimum schedule, submit jobs in that order. That is just having a look at the slurm files.**
numba
Putting taxis into a pandas DataFrame, then moving them by .apply().

bokae commented 6 years ago

Now, request result files are way too big. Moreover, it is not sustainable any more to leave uncompressed result files in the results folder. They should be written compressed, and then read by visualize.py compressed.

bokae commented 6 years ago

Results files are smaller now, I only output some aggregates on the requests.

I'm indexing the grid now everywhere in 1D, and the translation from 1D to 2D and vice versa is stored in dictionaries instead of being computed on the fly.

The BFS-trees that search from a certain position for the nearest locations are also stored in a dictionary, since looking up elements of a numpy.array is incredibly slow.

Replaced multivariate random generator by two 1D ones, correlated Gausses do not interest us for now. Multivariate is much slower than simple normal distribution.

New structure for keeping track of requests: one deque for storing each single request in time order, and one other with a fixed length for storing request sets that have been generated together in one timestep. If new bunch of requests are generated, they 'push out' the old ones from this deque. Also, before loosing the oldest requests, we delete the ones from the left end of the time-ordered deque that are in the to-be-deleted set of the other deque. This is the way of dropping requests that have been waiting for a match for too long.

bokae / taxi

Speeding up the simulation #10