consbio / mbtileserver

Basic Go server for mbtiles
ISC License
658 stars 104 forks source link

Too large memory usage in last version #103

Closed asmyasnikov closed 4 years ago

asmyasnikov commented 4 years ago

Hi! I'm use cgroups memory limitations from docker-compose.yml

  mbtiles:
    image: mbtiles
    environment:
      - TILE_DIR=/external,/internal
      - PORT=80
    networks:
      phoenix:
        ipv4_address: 172.18.0.21
    read_only: True
    restart: always
    volumes: ['/opt/mbtiles/:/internal', '/opt/usb/mbtiles/:/external']
    logging:
      driver: "none"
    deploy:
      restart_policy:
        condition: any
      resources:
        limits:
          memory: 30M

Limit 30M not exceed in older version. But last version (https://github.com/consbio/mbtileserver/commit/ca4a94e0c5a4487c64d271e6d1428a6aefcd094c) want about 1G of memory without limitation =( This is a critical for single-board computers... Log:

mbtiles_1               | time="2020-05-20T15:45:59Z" level=info msg="Searching for tilesets in /external\n"
mbtiles_1               | time="2020-05-20T15:45:59Z" level=info msg="Searching for tilesets in /internal\n"
mbtiles_1               | time="2020-05-20T15:46:01Z" level=info msg="Searching for tilesets in /external\n"
mbtiles_1               | time="2020-05-20T15:46:01Z" level=info msg="Searching for tilesets in /internal\n"
docker_mbtiles_1 exited with code 137
mbtiles_1               | time="2020-05-20T15:46:04Z" level=info msg="Searching for tilesets in /external\n"
mbtiles_1               | time="2020-05-20T15:46:04Z" level=info msg="Searching for tilesets in /internal\n"

internal and external paths contains big mbtiles files:

root@debian:/etc/docker# du -sh /opt/mbtiles/*
9,2M    /opt/mbtiles/countries-raster.mbtiles
18G     /opt/mbtiles/world_yandex.mbtiles
root@debian:/etc/docker# du -sh /opt/usb/mbtiles/*
52G     /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles
brendan-ward commented 4 years ago

@asmyasnikov thanks for reporting this. I haven't seen that to be the case in my own usage so far.

Can you help me understand some of the factors involved here?

Are you building your own Docker container for mbtileserver? If so, which version of Go?

Does it run out of memory immediately while searching for tiles, or does the server run for a while before running out of memory?

How many tiles are in internal vs external folders?

asmyasnikov commented 4 years ago

internal and external paths contains big mbtiles files:

root@debian:/etc/docker# du -sh /opt/mbtiles/*
9,2M    /opt/mbtiles/countries-raster.mbtiles
18G     /opt/mbtiles/world_yandex.mbtiles
root@debian:/etc/docker# du -sh /opt/usb/mbtiles/*
52G     /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles
asmyasnikov commented 4 years ago

Are you building your own Docker container for mbtileserver? If so, which version of Go?

I'm build mbtileserver with go1.14 And yes, I'm running mbtileserver inside a docker container My Dockerfile:

ARG ARCH=arm32v7
ARG QARCH=arm
ARG GOARCH=arm
ARG ORIGIN=github.com/consbio/mbtileserver
FROM ${ARCH}/busybox:glibc AS base
ARG QARCH=arm
ARG GOARCH=arm
ARG ORIGIN=None
FROM multiarch/qemu-user-static:x86_64-${QARCH} as qemu
ARG ARCH=arm32v7
ARG GOARCH=arm
ARG ORIGIN=None
FROM --platform=linux/${GOARCH} ${ARCH}/golang:latest AS build
COPY --from=qemu /usr/bin/ /usr/bin/
ARG ORIGIN=None
WORKDIR /build
RUN git clone https://${ORIGIN}.git .
RUN go build -mod=vendor -ldflags="-w -s" -o ./mbtileserver
RUN cp $(ldd ./mbtileserver | grep libdl.so | awk '{print $3}') /build/libdl.so.2
FROM base
ARG VERSION=None
ENV VERSION=$VERSION
COPY --from=build /build/libdl.so.2 /lib/libdl.so.2
COPY --from=build /build/mbtileserver /mbtileserver
ENTRYPOINT ["/mbtileserver"]
CMD []

Arch with this bug - amd64 (qemu - x86_64)

asmyasnikov commented 4 years ago

Does it run out of memory immediately while searching for tiles, or does the server run for a while before running out of memory?

No. A familiar message "Use Ctrl-C to exit the server" is missing

asmyasnikov commented 4 years ago

Without limitation of memory I have

mbtiles_1               | time="2020-05-20T16:13:10Z" level=info msg="Searching for tilesets in /external\n"
mbtiles_1               | time="2020-05-20T16:13:10Z" level=info msg="Searching for tilesets in /internal\n"
mbtiles_1               |
mbtiles_1               | --------------------------------------
mbtiles_1               | Use Ctrl-C to exit the server
mbtiles_1               | --------------------------------------
mbtiles_1               | HTTP server started on port 80
mbtiles_1               | time="2020-05-20T16:13:28Z" level=info msg="Published 3 services"

And after that usage of memory from 100% decresed to normal.

asmyasnikov commented 4 years ago

Снимок экрана от 2020-05-20 19-21-36 Start of process

brendan-ward commented 4 years ago

Thanks for the additional info. We are allocating a number of objects at startup, but most of those are proportional to the number of tilesets rather than their size.

I don't have very large tilesets available for testing with at the moment, but could rig up a test setup where memory is similarly limited during startup. I can't promise I can get to this immediately.

You might also be able to do some minimal testing here that could help:

  1. try a test where you start the server but point it to an empty directory. If this fails, this tells us we're grabbing a lot of memory during startup that isn't related to tiles.
  2. try a test where you start the server only pointing at your internal directory.
  3. try another test where you start the server only pointing at your external directory. If this fails but 2 does not, then please try an additional test where you have the 56GB tileset on an internal directory instead.
asmyasnikov commented 4 years ago

try a test where you start the server but point it to an empty directory. If this fails, this tells us we're grabbing a lot of memory during startup that isn't related to tiles.

No too memory usage

asmyasnikov commented 4 years ago

try a test where you start the server only pointing at your internal directory.

Снимок экрана от 2020-05-20 19-36-16

brendan-ward commented 4 years ago

Sorry, I couldn't understand what you meant by:

No too memory usage

Do you mean it started correctly?

asmyasnikov commented 4 years ago

try another test where you start the server only pointing at your external directory.

No memory excess. Yes, start correctly

asmyasnikov commented 4 years ago

But I cannot view map from /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles (pbf vector tiles). This tileset present in /services response Too memory usage tileset /opt/mbtiles/world_yandex.mbtiles (from internal) viewed on my map (mbtileserver returning tiles by z/x/y requests). Tileset /opt/mbtiles/countries-raster.mbtiles (from internal) also viewed on my map. Shaize! I find out the features...

asmyasnikov commented 4 years ago

On active moving of map (intensive requests z/x/y) mbtileserver also too memory usage Снимок экрана от 2020-05-20 20-00-26 And I have some log messages

mbtiles_1               |
mbtiles_1               | --------------------------------------
mbtiles_1               | Use Ctrl-C to exit the server
mbtiles_1               | --------------------------------------
mbtiles_1               | HTTP server started on port 80
mbtiles_1               | time="2020-05-20T16:58:14Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/88.png: write tcp 172.18.0.21:80->172.18.0.14:48110: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:17Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/90.png: write tcp 172.18.0.21:80->172.18.0.14:48244: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:21Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/139/87.png: write tcp 172.18.0.21:80->172.18.0.14:48322: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:21Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/139/86.png: write tcp 172.18.0.21:80->172.18.0.14:48324: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/138/88.png: write tcp 172.18.0.21:80->172.18.0.14:48314: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/7/66/44.png: write tcp 172.18.0.21:80->172.18.0.14:48048: write: broken pipe"  
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/7/65/43.png: write tcp 172.18.0.21:80->172.18.0.14:48068: write: broken pipe"  
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/135/90.png: write tcp 172.18.0.21:80->172.18.0.14:48088: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/89.png: write tcp 172.18.0.21:80->172.18.0.14:48224: write: broken pipe" 
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/10/540/346.png: write tcp 172.18.0.21:80->172.18.0.14:48384: write: broken pipe"
mbtiles_1               | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/272/175.png: write tcp 172.18.0.21:80->172.18.0.14:48332: write: broken pipe"
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/132/84.png: write tcp 172.18.0.21:80->172.18.0.14:48564: write: broken pipe" 
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/253/169.png: write tcp 172.18.0.21:80->172.18.0.14:48670: write: broken pipe"
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/252/171.png: write tcp 172.18.0.21:80->172.18.0.14:48666: write: broken pipe"
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/130/84.png: write tcp 172.18.0.21:80->172.18.0.14:48576: write: broken pipe" 
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/251/170.png: write tcp 172.18.0.21:80->172.18.0.14:48676: write: broken pipe"
mbtiles_1               | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/84.png: write tcp 172.18.0.21:80->172.18.0.14:48568: write: broken pipe" 
mbtiles_1               | time="2020-05-20T17:00:07Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/85.png: write tcp 172.18.0.21:80->172.18.0.14:48558: write: broken pipe" 

But this behaviour can be because single-board computer (LattePanda 4/64) too slow on network connections. May be

brendan-ward commented 4 years ago

I usually seebroken pipe messages when the client (browser map library) has canceled the tile requests; usually when aggressively panning / zooming in the web map.

Are you able to test with the prior commit (f2305b518fa81a9d7a3695abc0e4131401a936e6) just against your internal directory? It would be interesting to know if the refactor introduced a major memory regression.

asmyasnikov commented 4 years ago

Are you able to test with the prior commit (f2305b5) just against your internal directory? It would be interesting to know if the refactor introduced a major memory regression.

Also too memory usage in f2305b5. With limitation process always killed by cgroups policies. Without limitaion I have next picture Снимок экрана от 2020-05-20 20-00-26

asmyasnikov commented 4 years ago

I m understand nothing now I will search my stable (by memory usage version) tomorrow.

brendan-ward commented 4 years ago

Thanks @asmyasnikov for running these tests!

It looks like the memory on startup is stable between the prior and latest commits, so we didn't introduce a memory regression as far as I can tell. That's good news, at least.

I wonder if Go is not respecting the cgroups memory policy? Are you indeed seeing OOM errors in the log, as described here: https://fabianlee.org/2020/01/18/docker-placing-limits-on-container-memory-using-cgroups/

It could also be another issue:

asmyasnikov commented 4 years ago

I found small interpretation about not viewing map from vector (pbf) mbtiles file

root@debian:/du -s # used file
53492336        /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles
root@debian:/du -s # original file
53492348        /opt/usb2/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles

File was corrupted. Maybe because a storage is a USB-device I wil try to restore file and retry running mbtileserver with read-only volume mounting

asmyasnikov commented 4 years ago
image

This is latest stable version ( https://github.com/consbio/mbtileserver/commit/8c21f5ac4c1a6fa3e3a0d6115f86881fba359ddc ) of mbtileserver which I was used long time. I re-build (few minute ago) this version with my Dockerfile (equal golang, sqlite and other). docker-compose.yml are not modified. Load my /internal path with two mbtiles files (9MB and 18GB). When I dragging the map actively there are no load big CPU or memory peaks

asmyasnikov commented 4 years ago

I wonder if Go is not respecting the cgroups memory policy? Are you indeed seeing OOM errors in the log, as described here: https://fabianlee.org/2020/01/18/docker-placing-limits-on-container-memory-using-cgroups/

root@debian:/etc/docker# dmesg | grep -Ei "killed process"
[  531.528371] Killed process 7577 (mbtileserver) total-vm:1169360kB, anon-rss:13744kB, file-rss:0kB, shmem-rss:0kB
[  593.809625] Killed process 8035 (mbtileserver) total-vm:1169104kB, anon-rss:15044kB, file-rss:0kB, shmem-rss:0kB
[  656.035194] Killed process 8506 (mbtileserver) total-vm:1167952kB, anon-rss:11648kB, file-rss:0kB, shmem-rss:0kB
[  718.287512] Killed process 8960 (mbtileserver) total-vm:1168652kB, anon-rss:15032kB, file-rss:0kB, shmem-rss:0kB
[  780.535074] Killed process 9438 (mbtileserver) total-vm:1242836kB, anon-rss:13756kB, file-rss:0kB, shmem-rss:0kB
[ 1058.944100] Killed process 11774 (mbtileserver) total-vm:1168640kB, anon-rss:15004kB, file-rss:4kB, shmem-rss:13780kB
[ 1061.036764] Killed process 11892 (mbtileserver) total-vm:1242600kB, anon-rss:17232kB, file-rss:4kB, shmem-rss:16652kB
[ 1063.157830] Killed process 12010 (mbtileserver) total-vm:1167460kB, anon-rss:13328kB, file-rss:4kB, shmem-rss:16372kB
[ 1065.414487] Killed process 12129 (mbtileserver) total-vm:1168612kB, anon-rss:17240kB, file-rss:4kB, shmem-rss:16468kB
[ 1068.108419] Killed process 12246 (mbtileserver) total-vm:1242600kB, anon-rss:13160kB, file-rss:4kB, shmem-rss:16564kB
[ 1071.586636] Killed process 12372 (mbtileserver) total-vm:1242856kB, anon-rss:15192kB, file-rss:4kB, shmem-rss:16364kB
[ 1076.798350] Killed process 12506 (mbtileserver) total-vm:1316588kB, anon-rss:15220kB, file-rss:4kB, shmem-rss:16436kB
[ 1085.463741] Killed process 12657 (mbtileserver) total-vm:1242600kB, anon-rss:15160kB, file-rss:4kB, shmem-rss:16464kB
brendan-ward commented 4 years ago

Interesting! And confusing!

There really aren't substantive differences between that commit and the previous one you tested above. It seems that f2305b518fa81a9d7a3695abc0e4131401a936e6 should have had the same memory profile as that branch, because it was before the substantive changes were merged in.

Would you mind trying with the common ancestor (a613fe59f49b841006c5f1ad2fa23e4444e7090b) between that branch and master?

asmyasnikov commented 4 years ago

Would you mind trying with the common ancestor (a613fe5) between that branch and master?

It is a stable version (by memory and CPU on start and moving map)

asmyasnikov commented 4 years ago

ca4a94e also ok

asmyasnikov commented 4 years ago
image

446c5e4 some peaks on moving map and you will see some differences between memory and cache

asmyasnikov commented 4 years ago

Mistics...

brendan-ward commented 4 years ago

ca4a94e0c5a4487c64d271e6d1428a6aefcd094c is the tip of master. Is that correct? This is the one that started all the issues at the top of the thread.

asmyasnikov commented 4 years ago
image

ca4a94e was look like this

asmyasnikov commented 4 years ago

f2305b5 also rebuild, testing and getting this

image

I have no idea whats changed

asmyasnikov commented 4 years ago

3 hours ago I was see too large of memory usage. Only mbtileserver (from 24 containers) using so large memory. Sorry. I will be search another causes...

asmyasnikov commented 4 years ago

O! I remembered. All of last tests I make with read-only volume /internal:ro. Now I will try to tests without ro flag

asmyasnikov commented 4 years ago
image

f2305b5 is looks wrong

asmyasnikov commented 4 years ago
image

It is my stable version ( 8c21f5a ) Sorry. This bug only partly about of mbtileserver. Most causes about docker mounting ro/rw volumes. Sorry yet.

asmyasnikov commented 4 years ago

For your calm - see latest (head) version stats with ro volumes

image

Sorry

image
brendan-ward commented 4 years ago

Thanks for your detailed investigation @asmyasnikov ! I appreciate all the effort you invested in this and in reporting back on it here.

I'm glad that our recent changes didn't introduce a memory regression, and that in general our memory profile is very low.

asmyasnikov commented 4 years ago
image

=)))