gis-ops / docker-valhalla

This is our flexible Docker repository for the Valhalla routing engine
MIT License
247 stars 70 forks source link

Memory usage increases linearly with server_threads #81

Closed ktnr closed 1 year ago

ktnr commented 1 year ago

Follow up on #58 and #63. With server_threads=1, memory usage increases to 4GB on the first call and does not increase on subsequent calls (with exactly the same matrix request). Everything is fine. With server_threads=2, memory usage increases to 4GB on the first call and 8GB on the second call, but does not increase on the third call.

At first I had server_threads unset because I followed this tutorial, so it defaulted to nproc. This explains the increase in memory usage atleast in my case, as explained in https://github.com/gis-ops/docker-valhalla/issues/58#issuecomment-1342910517.

Is this expected behavior? The data is only accessed by read operations, right?

kevinkreiser commented 1 year ago

When using memory mapping each thread gets access to the map which will reported as the amount the os has ram cached multiplied by the number of open handles to the map. In this way memory usage can actually report above 100 percent iirc. Matrix though had some interesting dynamically allocated memory requirements which we may have neglected to trim back after each request. Do you get this or a similar behavior with route requests?

nilsnolde commented 1 year ago

Well, the funny thing is it happens only with our image, not with the Valhalla native one. With the same graph and the same requests. Don't understand.. Or know how to approach that..

ktnr commented 1 year ago

@nilsnolde: Cannot yet confirm that it's exactly the same behavior for the native valhalla image as I'm not sure I set/it defaulted to the same server_thread count.

Will also test the behavior for route requests.

ktnr commented 1 year ago

Alright. Same memory usage behavior for the matrix and route endpoint when using gis-ops/valhalla and valhalla/valhalla with the same server_threads, even when executing the same exact request multiple times: linear increase with server_threads and capped by max ram usage * server_threads.

For completeness, I'll attach the matrix request and valhalla config: valhalla-memory_increase.zip. Here's the route request

curl http://localhost:8002/route --data '{"locations":[{"lat":47.619904,"lon":12.902326},{"lat":51.717959,"lon":6.217763}],"costing":"auto"}'

and the compose file:

version: '3.0'
services:
  valhalla:
    image: gisops/valhalla:latest
    ports:
      - "8002:8002"
    volumes:
      - ${HOME}/downloads/map-data/valhalla/custom_files:/custom_files
    #mem_limit: 10g
    #cpus: 1
    environment:
      # The tile_file must be located in the `custom_files` folder.
      # The tile_file has priority and is used when valid.
      # If the tile_file doesn't exist, the url is used instead.
      # Don't blank out tile_url when you use tile_file and vice versa.
      - tile_urls=europe/germany-latest.osm.pbf
      - use_tiles_ignore_pbf=True
      - force_rebuild=False
      - force_rebuild_elevation=False
      - build_elevation=False
      - build_admins=True
      - build_time_zones=True
      - server_threads=2  # determines how many threads will be used to run the valhalla server

  valhalla-native:
    image: valhalla/valhalla:run-latest
    command:
      - /bin/bash
      - -c
      - |
        valhalla_service custom_files/valhalla.json 2 # The second argument specifies the nuzmber of `server_threads`
    ports:
      - 8002:8002
    volumes:
      - ${HOME}/downloads/map-data/valhalla/custom_files:/custom_files
    #mem_limit: 10g
    #cpus: 1

From what @kevinkreiser said above and in https://github.com/valhalla/valhalla/discussions/3405#discussioncomment-1645327, I wouldn't expect the increase in memory.

Also referencing https://github.com/valhalla/valhalla/issues/3556, as it might be relevant and may solve the issue of @elliveny. Note, limiting the number/share of cpus in the compose file does not set the memory cap, only limiting server_threads does.

nilsnolde commented 1 year ago

Thanks, that's super helpful and also relieving.. I don't really understand all the implications of threading vs multi-processing with regards to mem mapping and tile cache(s). Will have a session with the others to fully understand myself, then write it down in some docs in the upstream repo.

Though it still might be that we're not resetting some stuff in the matrix code the way we should..

ktnr commented 1 year ago

Glad the info helped. It would be great if you could link the write-up here. Love the valhalla ecosystem btw, keep it up.

nilsnolde commented 1 year ago

So finally I understand the operations stuff much better after a talk with @kevinkreiser . I'm sure we'll write it up at some point. It's quite involved though, so for anyone to really understand the internals, it'd have to be pretty detailed. What's possibly the least intuitive for newcomers/not-harcore-programmers is that in most environments you'd want to work with the tar archive, which leaves the memory consumption mostly to the OS, not Valhalla. But at least our image uses the tar by default, we don't even keep the plain tiles directory I think.

The other place that needs considerable RAM is the routing algos while expanding and the bidir matrix is by far the greediest. And that's happening per request, where after the first request it'll keep a considerable chunk of RAM allocated to avoid that penalty for the next request, though it does some (configurable) trimming. What Kevin was referring to: the matrix algo(s) might not trim enough their allocated memory after a request (even though a quick skim over the code looked fine, even too fine, it doesn't seem to keep any allocation..). So that's likely the place where we'd need to look.

To better reason about your situation: can you share your Valhalla config JSON? Only in case it's not the default.

ktnr commented 1 year ago

Are you using standard memory allocators? The config is already uploaded in https://github.com/gis-ops/docker-valhalla/issues/81#issuecomment-1343999540: valhalla-memory_increase.zip.

nilsnolde commented 1 year ago

Are you using standard memory allocators?

I‘m not a CS master, but yeah, AFAICT it’s the standard allocators coming with unordered_map & vectors.

kk2491 commented 1 year ago

I am facing similar issue, can anybody please explain me how can we fix this issue?

nilsnolde commented 1 year ago

enquiry@gis-ops.com ☺️

nilsnolde commented 1 year ago

So, turns out this is a feature, not a bug. Sorry to everyone, I also learned smth here..

nilsnolde commented 1 year ago

There's problems with the matrix, see https://github.com/valhalla/valhalla/issues/4064. No one really bothered to take a look back then, but @kevinkreiser had the right hunch here https://github.com/gis-ops/docker-valhalla/issues/81#issuecomment-1343438368

ktnr commented 5 months ago

I am unsure whether I have understood the expected behavior correctly, especially after https://github.com/valhalla/valhalla/issues/4064.

Seeing the newer comments in https://github.com/valhalla/valhalla/issues/3556, it seems others are still having issues and experiencing similar problems or misunderstand the expected behavior. In the issue, it is also mentioned that memory usage should be more efficient when using the tar files.

I rerun the test described in this issue. I have built my tiles with the default options as described in the Readme, which includes build_tar = True and use_tiles_ignore_pbf = True by default.

With these settings, I still get the same behavior as described above, where memory usage increases linearly (with the exact same requests sent repeatedly) with the number of server_threads and is capped by ~osm-extract-size * server_threads. Since the tiles/tar is accessed in a read-only manner (I suppose), I would not expect to see the increase in RAM usage. To me, this suggests that each thread is independently mapping or allocating memory without sharing it effectively with the other threads.