dabreegster / odjitter

Disaggregate zone-based origin/destination data to specific points
Apache License 2.0
12 stars 6 forks source link

`odjitter` crashing with big OD data and `--max-per-od 1` #5

Closed lucasccdias closed 2 years ago

lucasccdias commented 2 years ago

Hi, @dabreegster .

I am trying to use odjitter with a subset of the São Paulo OD data and it is crashing when I set --max-per-od 1. It works fine when I try with --max-per-od 100 and --max-per-od 10. My PC freezes in the process, so it is probably a RAM usage related problem -- I have a core i5 6th gen with 8GB running Ubuntu 20.04.3 LTS.

Here is a reproducible example (using R):

piggyback::pb_download(file = "zones_sp_center.geojson", 
                       repo = "spstreets/OD2017"
                       )

piggyback::pb_download(file = "od_sp_center.csv",
                       repo = "spstreets/OD2017"
                       )

system("odjitter --od-csv-path ./od_sp_center.csv --zones-path ./zones_sp_center.geojson --max-per-od 1 --output-path result.geojson")

# Scraped 114 zones from ./zones_sp_center.geojson
# Disaggregating OD data
# Killed
dabreegster commented 2 years ago

Thanks for the bug report! I can confirm the problem -- I managed to run it, getting a 2GB output file, but it took 30GB of RAM, which was right near my laptop's limit. :) The problem is that the tool buffers the entire GeoJSON representation in-memory and writes it all at once. There's no reason to do this; I'll work on a fix.

dabreegster commented 2 years ago

If you rebuilt it from latest git, it should consume very little memory. Took me about a minute to convert that whole file.

As someone on the georust discord pointed out, we should consider https://flatgeobuf.org/ for working with larger datasets like this!

lucasccdias commented 2 years ago

I just did it and worked flawless, it took about a minute here. Thanks!