Closed mfitz closed 8 months ago
What might be interessting is to move the linkLog and network to an sqlite db. Currently I see only some querries in which we are joining large contents. The access to those sqlite databases is interfaced by jooq, which allows a pretty similar query logic than lablesaw. On demand sqlite database could be also located in memory, if users want to have a high performance version.
What might be interessting is to move the linkLog and network to an sqlite db. Currently I see only some querries in which we are joining large contents. The access to those sqlite databases is interfaced by jooq, which allows a pretty similar query logic than lablesaw. On demand sqlite database could be also located in memory, if users want to have a high performance version.
This is definitely a thing to bear in mind, @steffenaxer, and it's why I would like to have "generic" interfaces for things like the link log and network, so we can add future implementations of those interfaces (e.g. a DB-backed link log) and switch between implementations with minimal changes.
I've used JOOQ before, but in a somewhat unusual way in that I was using the API without it being backed by a database. A database that can hold a good working set in memory and keep everything else on disk is an obvious way to reduce memory usage, albeit at the expense of performance. Allowing the user to tune where they want to be on the memory/performance trade-off would be a nice thing, for sure.
For now, I think we should keep it in mind, but hold off on any implementation until we've seen Gelato in the hands of a wider set of users who can guide our list of priorities.
Gelato is a very memory-intensive application. This is unavoidable to some extent, given that we are streaming (sometimes) hundreds of millions of events into in-memory data tables and holding an in-memory representation of the MATSim network and other objects. We accepted this memory intensity as a trade-off in exchange for a simple data frames programming model.
Even with this trade-off in mind, there are still potential opportunities to reduce the amount of memory used. We should look for quick-win memory optimisations to reduce Gelato's memory usage to something closer to the bare minimum without changing the basic data frames and in-memory analytics approach.
For example:
output_plans.xml.gz
oroutput_experienced_plans.xml.gz
files when creating the MATSim scenario object (we don't use that data at all)Table
version directly from MATSim event handling (but hide it behind a genericLinkLog
interface)As a beneficial side-effect, the running time would also reduce if we could achieve some of these optimisations.