danburzo / every-street

Drawing every street in Romania using OpenStreetMap data, Node.js and SVG
http://danburzo.ro/every-street
MIT License
78 stars 9 forks source link

Performace #9

Open Penagwin opened 8 years ago

Penagwin commented 8 years ago

This data was gathered using the latest florida image from geofabrik.de (md5sum 89a6468db0a90ce7b5392695892287bf) inside of a Ubuntu VM that was allocated 2GB of ram with 2 cores (i5-4570, host has 12GB of ram and was not under load)

Step Time
Extracting Streets 34s
Extracting Nodes 49s
Loading Nodes 1m 39s
Applying Nodes 5m 35s
Finding Binding Box 5s
Mapping Coordinates 24s
Generating SVG 19s

Using node inspector to inspect the Applying Nodes step shows:

Applying Nodes Step

I'm not sure what the deal is with the idle time, but ignoring that we can see that Leveldown is taking 60% of the time reading from the database(Database and LevelDOWN._get are seperate at 30% each). I'm going to run some experiments with using Redis as an alternative to leveldb. Yes it will be in ram, but I'm hoping that because it is in a database the amount used should be much more manageable (LevelDB only uses 300MB of diskspace in this case)

Using docker could let you test a redis server really quick and cleanly (as it doesn't touch the host).

Penagwin commented 8 years ago

My results from Redis were... interesting. Redis used all 2GB of ram, as well as 1GB of swap, then the vm was basically frozen. I'm not sure how leveldb got away with using such a small amount of space.

I now have two new ideas:

  1. Put the LevelDB into a ramdisk (something like joaquimserafim/node-ramdisk )
  2. Merge the scripts together. Scripts #1 and #2 can be done in parallel. Then they may be able to be streamed to webworkers ( maxogden/workerstream ) to better support multithreading.
danburzo commented 8 years ago

Thank you for taking the time to experiment with Redis. In regards to merging extract-nodes.js and extract-streets.js you're right, the single read stream can probably be piped to two separate write streams.

Performance-wise, it does seem however that the database is the true bottleneck, and I'd love to figure out a faster way of mapping the node IDs to their coordinates.