filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.84k stars 1.26k forks source link

K8s uses the official docker image (filecoin / Lotus: v1.17.0) to automatically restart after running for a period of time, and returns an exit code of 139 #9262

Open git-ljm opened 2 years ago

git-ljm commented 2 years ago

Checklist

Lotus component

Lotus Version

Daemon:  1.17.0+mainnet+git.2830429ad.dirty+api1.5.0

Describe the Bug

K8s uses the official docker image (filecoin / Lotus: v1.17.0) to automatically restart after running for a period of time, and returns an exit code of 139;

The log is a normal point, and the lotus daemon automatically terminates without any symptoms.

Logging Information

"2022-09-05T07:55:30.434Z\tINFO\tchain\tchain/sync_manager.go:323\tworker 844 done; took 156.786598ms\n"
"2022-09-05T07:55:30.432Z\tINFO\tchainstore\tstore/store.go:643\tNew heaviest tipset! [bafy2bzacedowalgejca3bwt5yvzvfssddt7t5uequoplifnipqz4tckq3ixc4 bafy2bzacedqc6hvh5jvyutw234ecg52cygqxjkzf4cde3bsigvagj3t7fqqcm bafy2bzacecbitir6kc63typ55otkek4ivyejciypmkj6vdfsj65f65xp7n7rs bafy2bzacecd4gtwevvbynd3mw67yo5gejwhyssuwqipkacn4j76e52m3agl2q] (height=2135271)\n"

Repo Steps

  1. Run '...'
  2. Do '...'
  3. See error '...' ...
smagdali commented 2 years ago

@ianconsolata not sure if you've seen this

ianconsolata commented 2 years ago

@git-ljm can you provide more details about your setup and how you’re running the container? You mentioned k8s — are you using a chart? How are you deploying it?

git-ljm commented 1 year ago

This is my current stateful.yml information

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: filecoin
spec:
  selector:
    matchLabels:
      app: filecoin
  replicas: 1
  serviceName: filecoin
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: filecoin
    spec:
      terminationGracePeriodSeconds: 300
      enableServiceLinks: false
      containers:
        - name: node
          image: filecoin/lotus:v1.17.2
          env:
            - name: LOTUS_PATH
              value: "/opt/lotus/"
            - name: FIL_PROOFS_PARAMETER_CACHE
              value: "/opt/lotus/filecoin-proof-parameters"
          command: ["lotus","daemon"]
          args:
            - "--config"
            - "/config.toml"
          ports:
            - containerPort: 1234
          resources:
            requests:
              memory: 64G
              cpu: 4000m
            limits:
              memory: 128G
              cpu: 16000m
          livenessProbe:
            failureThreshold: 10
            tcpSocket:
              port: 1234
          readinessProbe:
            failureThreshold: 10
            tcpSocket:
              port: 1234
          startupProbe:
            failureThreshold: 600
            tcpSocket:
              port: 1234
          volumeMounts:
            - name: chaindata
              mountPath: /opt/lotus
            - name: config
              mountPath: /config.toml
              subPath: config.toml
      volumes:
        - persistentVolumeClaim:
            claimName: chaindata-filecoin
          name: chaindata
        - name: config
          configMap:
            name: filecoin-config
            defaultMode: 493
        - name: scripts
          configMap:
            name: filecoin-scripts
            defaultMode: 493
      securityContext:
        runAsGroup: 532
        runAsUser: 532
        fsGroup: 532
        fsGroupChangePolicy: OnRootMismatch

I adjusted the memory space to 128Gib. At present, it restarts on average once every two days.

The following is the resource utilization rate in the last seven days: image