Closed asmyasnikov closed 4 years ago
@asmyasnikov thanks for reporting this. I haven't seen that to be the case in my own usage so far.
Can you help me understand some of the factors involved here?
Are you building your own Docker container for mbtileserver? If so, which version of Go?
Does it run out of memory immediately while searching for tiles, or does the server run for a while before running out of memory?
How many tiles are in internal vs external folders?
internal and external paths contains big mbtiles files:
root@debian:/etc/docker# du -sh /opt/mbtiles/* 9,2M /opt/mbtiles/countries-raster.mbtiles 18G /opt/mbtiles/world_yandex.mbtiles root@debian:/etc/docker# du -sh /opt/usb/mbtiles/* 52G /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles
Are you building your own Docker container for mbtileserver? If so, which version of Go?
I'm build mbtileserver with go1.14 And yes, I'm running mbtileserver inside a docker container My Dockerfile:
ARG ARCH=arm32v7
ARG QARCH=arm
ARG GOARCH=arm
ARG ORIGIN=github.com/consbio/mbtileserver
FROM ${ARCH}/busybox:glibc AS base
ARG QARCH=arm
ARG GOARCH=arm
ARG ORIGIN=None
FROM multiarch/qemu-user-static:x86_64-${QARCH} as qemu
ARG ARCH=arm32v7
ARG GOARCH=arm
ARG ORIGIN=None
FROM --platform=linux/${GOARCH} ${ARCH}/golang:latest AS build
COPY --from=qemu /usr/bin/ /usr/bin/
ARG ORIGIN=None
WORKDIR /build
RUN git clone https://${ORIGIN}.git .
RUN go build -mod=vendor -ldflags="-w -s" -o ./mbtileserver
RUN cp $(ldd ./mbtileserver | grep libdl.so | awk '{print $3}') /build/libdl.so.2
FROM base
ARG VERSION=None
ENV VERSION=$VERSION
COPY --from=build /build/libdl.so.2 /lib/libdl.so.2
COPY --from=build /build/mbtileserver /mbtileserver
ENTRYPOINT ["/mbtileserver"]
CMD []
Arch with this bug - amd64 (qemu - x86_64)
Does it run out of memory immediately while searching for tiles, or does the server run for a while before running out of memory?
No. A familiar message "Use Ctrl-C to exit the server" is missing
Without limitation of memory I have
mbtiles_1 | time="2020-05-20T16:13:10Z" level=info msg="Searching for tilesets in /external\n"
mbtiles_1 | time="2020-05-20T16:13:10Z" level=info msg="Searching for tilesets in /internal\n"
mbtiles_1 |
mbtiles_1 | --------------------------------------
mbtiles_1 | Use Ctrl-C to exit the server
mbtiles_1 | --------------------------------------
mbtiles_1 | HTTP server started on port 80
mbtiles_1 | time="2020-05-20T16:13:28Z" level=info msg="Published 3 services"
And after that usage of memory from 100% decresed to normal.
Start of process
Thanks for the additional info. We are allocating a number of objects at startup, but most of those are proportional to the number of tilesets rather than their size.
I don't have very large tilesets available for testing with at the moment, but could rig up a test setup where memory is similarly limited during startup. I can't promise I can get to this immediately.
You might also be able to do some minimal testing here that could help:
try a test where you start the server but point it to an empty directory. If this fails, this tells us we're grabbing a lot of memory during startup that isn't related to tiles.
No too memory usage
try a test where you start the server only pointing at your internal directory.
Sorry, I couldn't understand what you meant by:
No too memory usage
Do you mean it started correctly?
try another test where you start the server only pointing at your external directory.
No memory excess. Yes, start correctly
But I cannot view map from /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles
(pbf vector tiles). This tileset present in /services response
Too memory usage tileset /opt/mbtiles/world_yandex.mbtiles
(from internal) viewed on my map (mbtileserver returning tiles by z/x/y requests). Tileset /opt/mbtiles/countries-raster.mbtiles
(from internal) also viewed on my map.
Shaize!
I find out the features...
On active moving of map (intensive requests z/x/y) mbtileserver also too memory usage And I have some log messages
mbtiles_1 |
mbtiles_1 | --------------------------------------
mbtiles_1 | Use Ctrl-C to exit the server
mbtiles_1 | --------------------------------------
mbtiles_1 | HTTP server started on port 80
mbtiles_1 | time="2020-05-20T16:58:14Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/88.png: write tcp 172.18.0.21:80->172.18.0.14:48110: write: broken pipe"
mbtiles_1 | time="2020-05-20T16:58:17Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/90.png: write tcp 172.18.0.21:80->172.18.0.14:48244: write: broken pipe"
mbtiles_1 | time="2020-05-20T16:58:21Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/139/87.png: write tcp 172.18.0.21:80->172.18.0.14:48322: write: broken pipe"
mbtiles_1 | time="2020-05-20T16:58:21Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/139/86.png: write tcp 172.18.0.21:80->172.18.0.14:48324: write: broken pipe"
mbtiles_1 | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/138/88.png: write tcp 172.18.0.21:80->172.18.0.14:48314: write: broken pipe"
mbtiles_1 | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/7/66/44.png: write tcp 172.18.0.21:80->172.18.0.14:48048: write: broken pipe"
mbtiles_1 | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/7/65/43.png: write tcp 172.18.0.21:80->172.18.0.14:48068: write: broken pipe"
mbtiles_1 | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/135/90.png: write tcp 172.18.0.21:80->172.18.0.14:48088: write: broken pipe"
mbtiles_1 | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/89.png: write tcp 172.18.0.21:80->172.18.0.14:48224: write: broken pipe"
mbtiles_1 | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/10/540/346.png: write tcp 172.18.0.21:80->172.18.0.14:48384: write: broken pipe"
mbtiles_1 | time="2020-05-20T16:58:34Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/272/175.png: write tcp 172.18.0.21:80->172.18.0.14:48332: write: broken pipe"
mbtiles_1 | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/132/84.png: write tcp 172.18.0.21:80->172.18.0.14:48564: write: broken pipe"
mbtiles_1 | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/253/169.png: write tcp 172.18.0.21:80->172.18.0.14:48670: write: broken pipe"
mbtiles_1 | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/252/171.png: write tcp 172.18.0.21:80->172.18.0.14:48666: write: broken pipe"
mbtiles_1 | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/130/84.png: write tcp 172.18.0.21:80->172.18.0.14:48576: write: broken pipe"
mbtiles_1 | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/9/251/170.png: write tcp 172.18.0.21:80->172.18.0.14:48676: write: broken pipe"
mbtiles_1 | time="2020-05-20T17:00:06Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/84.png: write tcp 172.18.0.21:80->172.18.0.14:48568: write: broken pipe"
mbtiles_1 | time="2020-05-20T17:00:07Z" level=error msg="Could not write tile data for /services/APys0YzHqGQlIW6Qn2NflUQXOBE/tiles/8/131/85.png: write tcp 172.18.0.21:80->172.18.0.14:48558: write: broken pipe"
But this behaviour can be because single-board computer (LattePanda 4/64) too slow on network connections. May be
I usually seebroken pipe
messages when the client (browser map library) has canceled the tile requests; usually when aggressively panning / zooming in the web map.
Are you able to test with the prior commit (f2305b518fa81a9d7a3695abc0e4131401a936e6) just against your internal directory? It would be interesting to know if the refactor introduced a major memory regression.
Are you able to test with the prior commit (f2305b5) just against your internal directory? It would be interesting to know if the refactor introduced a major memory regression.
Also too memory usage in f2305b5. With limitation process always killed by cgroups policies. Without limitaion I have next picture
I m understand nothing now I will search my stable (by memory usage version) tomorrow.
Thanks @asmyasnikov for running these tests!
It looks like the memory on startup is stable between the prior and latest commits, so we didn't introduce a memory regression as far as I can tell. That's good news, at least.
I wonder if Go is not respecting the cgroups memory policy? Are you indeed seeing OOM errors in the log, as described here: https://fabianlee.org/2020/01/18/docker-placing-limits-on-container-memory-using-cgroups/
It could also be another issue:
GOGC
interacts with available memory, see https://golang.org/pkg/runtime/I found small interpretation about not viewing map from vector (pbf) mbtiles file
root@debian:/du -s # used file
53492336 /opt/usb/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles
root@debian:/du -s # original file
53492348 /opt/usb2/mbtiles/osm-2017-07-03-v3.6.1-planet.mbtiles
File was corrupted. Maybe because a storage is a USB-device I wil try to restore file and retry running mbtileserver with read-only volume mounting
This is latest stable version ( https://github.com/consbio/mbtileserver/commit/8c21f5ac4c1a6fa3e3a0d6115f86881fba359ddc ) of mbtileserver which I was used long time. I re-build (few minute ago) this version with my Dockerfile (equal golang, sqlite and other). docker-compose.yml are not modified. Load my /internal
path with two mbtiles files (9MB and 18GB).
When I dragging the map actively there are no load big CPU or memory peaks
I wonder if Go is not respecting the cgroups memory policy? Are you indeed seeing OOM errors in the log, as described here: https://fabianlee.org/2020/01/18/docker-placing-limits-on-container-memory-using-cgroups/
root@debian:/etc/docker# dmesg | grep -Ei "killed process"
[ 531.528371] Killed process 7577 (mbtileserver) total-vm:1169360kB, anon-rss:13744kB, file-rss:0kB, shmem-rss:0kB
[ 593.809625] Killed process 8035 (mbtileserver) total-vm:1169104kB, anon-rss:15044kB, file-rss:0kB, shmem-rss:0kB
[ 656.035194] Killed process 8506 (mbtileserver) total-vm:1167952kB, anon-rss:11648kB, file-rss:0kB, shmem-rss:0kB
[ 718.287512] Killed process 8960 (mbtileserver) total-vm:1168652kB, anon-rss:15032kB, file-rss:0kB, shmem-rss:0kB
[ 780.535074] Killed process 9438 (mbtileserver) total-vm:1242836kB, anon-rss:13756kB, file-rss:0kB, shmem-rss:0kB
[ 1058.944100] Killed process 11774 (mbtileserver) total-vm:1168640kB, anon-rss:15004kB, file-rss:4kB, shmem-rss:13780kB
[ 1061.036764] Killed process 11892 (mbtileserver) total-vm:1242600kB, anon-rss:17232kB, file-rss:4kB, shmem-rss:16652kB
[ 1063.157830] Killed process 12010 (mbtileserver) total-vm:1167460kB, anon-rss:13328kB, file-rss:4kB, shmem-rss:16372kB
[ 1065.414487] Killed process 12129 (mbtileserver) total-vm:1168612kB, anon-rss:17240kB, file-rss:4kB, shmem-rss:16468kB
[ 1068.108419] Killed process 12246 (mbtileserver) total-vm:1242600kB, anon-rss:13160kB, file-rss:4kB, shmem-rss:16564kB
[ 1071.586636] Killed process 12372 (mbtileserver) total-vm:1242856kB, anon-rss:15192kB, file-rss:4kB, shmem-rss:16364kB
[ 1076.798350] Killed process 12506 (mbtileserver) total-vm:1316588kB, anon-rss:15220kB, file-rss:4kB, shmem-rss:16436kB
[ 1085.463741] Killed process 12657 (mbtileserver) total-vm:1242600kB, anon-rss:15160kB, file-rss:4kB, shmem-rss:16464kB
Interesting! And confusing!
There really aren't substantive differences between that commit and the previous one you tested above. It seems that f2305b518fa81a9d7a3695abc0e4131401a936e6 should have had the same memory profile as that branch, because it was before the substantive changes were merged in.
Would you mind trying with the common ancestor (a613fe59f49b841006c5f1ad2fa23e4444e7090b) between that branch and master?
Would you mind trying with the common ancestor (a613fe5) between that branch and master?
It is a stable version (by memory and CPU on start and moving map)
ca4a94e also ok
446c5e4 some peaks on moving map and you will see some differences between memory and cache
Mistics...
ca4a94e0c5a4487c64d271e6d1428a6aefcd094c is the tip of master. Is that correct? This is the one that started all the issues at the top of the thread.
ca4a94e was look like this
f2305b5 also rebuild, testing and getting this
I have no idea whats changed
3 hours ago I was see too large of memory usage. Only mbtileserver (from 24 containers) using so large memory. Sorry. I will be search another causes...
O! I remembered. All of last tests I make with read-only volume /internal:ro
.
Now I will try to tests without ro
flag
f2305b5 is looks wrong
It is my stable version ( 8c21f5a ) Sorry. This bug only partly about of mbtileserver. Most causes about docker mounting ro/rw volumes. Sorry yet.
For your calm - see latest (head) version stats with ro
volumes
Sorry
Thanks for your detailed investigation @asmyasnikov ! I appreciate all the effort you invested in this and in reporting back on it here.
I'm glad that our recent changes didn't introduce a memory regression, and that in general our memory profile is very low.
=)))
Hi! I'm use cgroups memory limitations from docker-compose.yml
Limit 30M not exceed in older version. But last version (https://github.com/consbio/mbtileserver/commit/ca4a94e0c5a4487c64d271e6d1428a6aefcd094c) want about 1G of memory without limitation =( This is a critical for single-board computers... Log:
internal and external paths contains big mbtiles files: