apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.38k stars 1.3k forks source link

FDB 6.3 goes down with exit code 20 or 139 #5405

Closed falsandtru closed 2 years ago

falsandtru commented 3 years ago

The following official sample goes down in 2 days. Looks like FDB or its binding has a bug. Can you fix the problem?

https://github.com/apple/foundationdb/tree/master/packaging/docker/samples/golang

CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS                      PORTS                    NAMES
486348df172b        fdbgolangsample_app                "/start.bash"            2 days ago          Up 2 days                   0.0.0.0:8080->8080/tcp   fdbgolangsample_app_1
fa6e3729a803        foundationdb/foundationdb:6.3.15   "/usr/bin/tini -g --…"   2 days ago          Exited (20) 8 minutes ago                            fdbgolangsample_fdb-server-1_1
fd8a17a2195f        foundationdb/foundationdb:6.3.15   "/usr/bin/tini -g --…"   2 days ago          Up 2 days                   0.0.0.0:4500->4500/tcp   fdbgolangsample_fdb-coordinator_1

I used the following official config that reduced a server.

version: '3'
services:
  # Specify three fdbserver processes.
  fdb-coordinator:
    image: foundationdb/foundationdb:${FDB_VERSION}
    ports:
      - 4500:4500/tcp
    environment:
      FDB_COORDINATOR: ${FDB_COORDINATOR}
      FDB_NETWORKING_MODE: ${FDB_NETWORKING_MODE}
      FDB_COORDINATOR_PORT: ${FDB_COORDINATOR_PORT}

  fdb-server-1:
    depends_on:
      - fdb-coordinator
    image: foundationdb/foundationdb:${FDB_VERSION}
    environment:
      FDB_COORDINATOR: ${FDB_COORDINATOR}
      FDB_NETWORKING_MODE: ${FDB_NETWORKING_MODE}
      FDB_COORDINATOR_PORT: ${FDB_COORDINATOR_PORT}

  # Bring up the application so that it depends on the cluster.
  app:
    depends_on:
      - fdb-coordinator
      - fdb-server-1
    build:
      context: app
      args:
        FDB_VERSION: ${FDB_VERSION}
    ports:
      - 8080:8080/tcp
    environment:
      FDB_COORDINATOR: ${FDB_COORDINATOR}
      FDB_API_VERSION: ${FDB_API_VERSION}
xumengpanda commented 3 years ago

Hi, we use FDB Forum to discuss such questions: https://forums.foundationdb.org

I would appreciate if you could explain in the forum:

falsandtru commented 3 years ago

This is your official sample https://github.com/apple/foundationdb/tree/master/packaging/docker/samples/golang. So you can reproduce the problem. However, I fixed and updated it: https://github.com/apple/foundationdb/pull/5373.

falsandtru commented 3 years ago

I did run that sample on Ubuntu 20.04 and docker-compose 1.24.0.

falsandtru commented 3 years ago

Can you reproduce?

falsandtru commented 3 years ago

Official sample was down also when the config wasn't changed. However, its exit code was changed to 139.

CONTAINER ID        IMAGE                              COMMAND                  CREATED             STATUS                     PORTS                    NAMES
dc31eb833ceb        fdbgolangsample_app                "/start.bash"            2 days ago          Up 2 days                  0.0.0.0:8080->8080/tcp   fdbgolangsample_app_1
c66d8e61664a        foundationdb/foundationdb:6.3.15   "/usr/bin/tini -g --…"   2 days ago          Exited (139) 2 hours ago                            fdbgolangsample_fdb-server-1_1
82c77d4763af        foundationdb/foundationdb:6.3.15   "/usr/bin/tini -g --…"   2 days ago          Up 2 days                                           fdbgolangsample_fdb-server-2_1
4e40a63f544f        foundationdb/foundationdb:6.3.15   "/usr/bin/tini -g --…"   2 days ago          Up 2 days                  0.0.0.0:4500->4500/tcp   fdbgolangsample_fdb-coordinator_1
falsandtru commented 3 years ago

Looks like Go and Python bindings don't increase memory usage with FDB 6.2, but they increase it with FDB 6.3. So the cause of the problem is probably FDB 6.3.

sfc-gh-abeamon commented 3 years ago

I believe this might be caused by the fact that the docker image for 6.3 does not create a logs directory for the FDB trace events, and as a result the processes are buffering them up over time.

That means the immediate fix would be to update the docker image (see #5448), and an optional secondary one would be to limit how much FDB server is willing to buffer trace events.

jzhou77 commented 3 years ago

All fixes are merged. The question is how do we publish the docker image for packaging/docker/release/Dockerfile? @ammolitor , assign this to you for publishing the correct docker image.

ammolitor commented 2 years ago

Closing, we have produced multiple 6.3.x releases (and corresponding container images) since August 2021.