Open shriv opened 6 months ago
Hi @shriv!
Long time, no speak 😄
Thanks for this. I don't think there's anything wrong with your Docker command. The two JVM params are fairly much either/or (an absolute value for max heap size or a percentage of the memory available to the container), but I doubt that is causing a problem.
I think it's much more likely that Gelato is just struggling with your particular model for some reason.
A few questions:
Which version of Gelato are you using? Presumably, you cloned the repo and built the Docker image locally? What does docker run gelato --version
give you?
What does the log output look like? Is it progressing at all, or does it seem to hang? Could you share the log?
What does your model look like?
One minor thing:
-e JVM_OPTS="-Xmx100G" \ -e JVM_OPTS="-XX:MaxRAMPercentage=80.0" \
The JVM_OPTS
env var should contain all of your JVM params in a single place. If you need to set more than one, use a space-separated list inside the same string, as in this example:
docker run \
-v /path/to/my-model/outputs/:/my-model-outputs \
-v /path/to/gelato-outputs/my-model:/gelato-out \
-e JVM_OPTS="-Xmx100G -XX:+PrintFlagsFinal" \
gelato \
-mc /my-model-outputs/output_config.xml \
-mo /my-model-outputs \
-o /gelato-out
I am not sure what happens when you set two different values for the same env var, which is what I think you're doing here. Maybe the second value overrides the first. In any case, even if you successfully set both of those JVM params, you would potentially have set two different values for max heap size (unless 80% of available RAM works out to 100GB in your environment, in which case you would be setting the same max heap size twice).
Again, though, I don't think this is likely to be causing a problem.
To be fair, the docs don't make this clear, so I created #86 to fix that.
Hi @mfitz Thanks heaps for your suggestions :-) Some notes below on my previous and current runs. I've now changed the Docker run command to just have one JVM opts and running on a different machine (r5a.2xlarge) to see if it makes a difference.
output_links.csv.gz
output_allVehicles.xml.gz
output_activities.csv.gz
<module name="montyPricing" >
<param name="tollCap" value="-50.0" />
<param name="tollLinksFile" value="/mot-non-production-simulations/applications/pt_variations_2018/inputs/tolls.xml.gz" />
</module>
In progress
docker run \
-v ${PWD}/monty-data:/my-model-outputs \
-v ${PWD}/gelato-output:/gelato-out \
-e JVM_OPTS="-Xmx60G -XX:+PrintFlagsFinal" \
gelato \
-mc /my-model-outputs/output_config.xml \
-mo /my-model-outputs \
-o /gelato-out
Had another go (as per command in previous comment) and got RAM to use up to 20G of the 64GB available and ~15% CPU but no outputs even after running for 8hours. I had the following output during the processing re. your question on number of events being processed. Let me know if I just need to kick it off and run it overnight? If I do that, I assume I can just run Docker as a detached process with docker -d ...
?
2024-05-12T21:24:41,594 INFO MatsimKpiGenerator:119 Recorded 104,249,952 relevant MATSim events from /my-model-outputs/output_events.xml.gz
2024-05-12T21:24:41,637 INFO MatsimKpiGenerator:121 {
"vehicle enters traffic" : 1139673,
"PersonLeavesVehicle" : 1229877,
"entered link" : 49755426,
"PersonEntersVehicle" : 1229894,
"vehicle leaves traffic" : 1139656,
"left link" : 49755426
}
2024-05-12T21:24:41,638 INFO MatsimKpiGenerator:119 Recorded 598,352 relevant MATSim events from /my-model-outputs/output_events.xml.gz
2024-05-12T21:24:41,646 INFO MatsimKpiGenerator:121 {
"personMoney" : 598352
}
Let me know if I just need to kick it off and run it overnight? If I do that, I assume I can just run Docker as a detached process with
docker -d ...
?
By all means, leave it running - it will be interesting to see what happens.
What was the last thing logged to the console? That will give us a big clue about what it's doing and where it is spending the most time. We've successfully run Gelato on similar-sized models before now, but that was before we introduced the support for person money events, IIRC.
If you do leave it running, could you capture the log output in a file so we can analyse it later? A low-tech way to do that is to pipe the output of the Docker command to a file (> /path/to/file/gelato.log
)
I have what I assume is a subset of a NZ model (not sure which) that came to me via @gac55. I think it was originally from Patrick. Gelato handles that comfortably in around 3 minutes, but the events file is a lot smaller than yours:
2024-05-10T14:52:25,773 INFO MatsimKpiGenerator:119 Recorded 10,740,360 relevant MATSim events from /nz-model-outputs/output_events.xml.gz
2024-05-10T14:52:25,800 INFO MatsimKpiGenerator:121 {
"vehicle enters traffic" : 53733,
"PersonLeavesVehicle" : 54626,
"entered link" : 5261821,
"PersonEntersVehicle" : 54642,
"vehicle leaves traffic" : 53717,
"left link" : 5261821
}
2024-05-10T14:52:25,801 INFO MatsimKpiGenerator:119 Recorded 6,192 relevant MATSim events from /nz-model-outputs/output_events.xml.gz
2024-05-10T14:52:25,804 INFO MatsimKpiGenerator:121 {
"personMoney" : 6192
}
Are you able to share all of your MATSim output files? If we can get our hands on those, we can run Gelato against them and try and debug it locally.
Also, if you're able to run Gelato in the JVM directly, rather than via Docker, I'd be interested to know if the behaviour is the same. That might be worth a try, just to give us another data point.
YAY! The gelato
run finished last night around 2am. I've saved the logs and outputs to S3 for analysis. I'll be taking a look later this week or early next. I've uploaded the log if you want to take a look at it @mfitz :-)
gelato.log
Great stuff @shriv - thanks!
From a quick look at the log:
1 - the entire process was very slow 2 - it looks like building the tables for the road charges was the dominant factor
I'll do a little more analysis later, but this is very helpful 👍
Hi guys, I'm running
gelato
over a baseline MATSim simulation with Docker on an EC2 instance (r5a.4xlarge). It's been running for several hours but CPU usage is only 7% and memory usage is only around 13GB (far less than the 128GB available). I used the following Docker command and possibly gone awry somewhere! 😓