erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.1k stars 1.09k forks source link

Polygon erigon3 failed to start after restart #11613

Open insider89 opened 1 month ago

insider89 commented 1 month ago

System information

Erigon version: v3.0.0-alpha2

OS & Version: Linux

Commit hash: 6124a58f7f6560641e25181e05e74b3bbfdaa95a

Erigon Command (with flags/config): --chain=bor-mainnet --txpool.nolocals --db.pagesize=16k --private.api.addr=0.0.0.0:9090 --nat=extip:YOU_EXTERNAL_IP --authrpc.vhosts= --authrpc.jwtsecret=/home/erigon/.local/share/erigon/jwtsecret --authrpc.addr=0.0.0.0 --datadir=/home/erigon/.local/share/erigon --db.size.limit=15TB --metrics --port=30843 --p2p.allowed-ports=30843 --p2p.allowed-ports=30844 --p2p.allowed-ports=30845 --metrics.addr=0.0.0.0 --metrics.port=6060 --torrent.download.rate=1000mb --http.api=admin,net,eth,erigon,web3,net,debug,trace,txpool,engine,ots --http.addr=0.0.0.0 --http.vhosts= --http.corsdomain=* --rpc.batch.limit=1000 --bodies.cache=5G --ws --db.read.concurrency=1024 --rpc.batch.concurrency=64 --txpool.pricelimit=30000000000 --bor.milestone=false --bor.heimdall=http://polygon-heimdall-rpc:1317/ --log.dir.path=/home/erigon/.local/share/erigon/logs

Consensus Layer: Caplin

Consensus Layer Command (with flags/config):

Chain/Network: Polygon

Expected behaviour

After pod with Polygon Erigon 3 restart, it starts

Actual behaviour

Pod failed to start, cause liveness and readiness probes failed

 Warning  Unhealthy  8m17s (x10 over 9m32s)  kubelet            Readiness probe failed: dial tcp 11.12.8.233:8545: connect: connection refused                                                                  │
│   Normal   Killing    8m17s                   kubelet            Container polygon-3 failed liveness probe, will be restarted                                                                                    │
│   Normal   Pulled     8m13s (x2 over 10m)     kubelet            Container image "docker.io/thorax/erigon:v3.0.0-alpha2" already present on machine                                                              │
│   Warning  Unhealthy  4m57s (x26 over 9m2s)   kubelet            Liveness probe failed: dial tcp 11.12.8.233:8545: connect: connection refused

In the logs I see only following

[INFO] [08-14|11:29:25.854] logging to file system                   log dir=/home/erigon/.local/share/erigon/logs file prefix=erigon log level=info json=false
[INFO] [08-14|11:29:25.855] Enabling metrics export to prometheus    path=http://0.0.0.0:6060/debug/metrics/prometheus
[INFO] [08-14|11:29:25.855] Build info                               git_branch=heads/v3.0.0-alpha2 git_tag=v3.0.0-alpha2-dirty git_commit=6124a58f7f6560641e25181e05e74b3bbfdaa95a
[INFO] [08-14|11:29:25.855] 
    ########b          oo                               d####b. 
    ##                                                      '## 
    ##aaaa    ##d###b. dP .d####b. .d####b. ##d###b.     aaad#' 
    ##        ##'  '## ## ##'  '## ##'  '## ##'  '##        '## 
    ##        ##       ## ##.  .## ##.  .## ##    ##        .## 
    ########P dP       dP '####P## '#####P' dP    dP    d#####P 
                               .##                              
                           d####P                               

[INFO] [08-14|11:29:25.855] Starting Erigon on Bor Mainnet... 
[INFO] [08-14|11:29:27.051] Maximum peer count                       ETH=100 total=100
[INFO] [08-14|11:29:27.052] starting HTTP APIs                       port=8545 APIs=admin,net,eth,erigon,web3,net,debug,trace,txpool,engine,ots
[INFO] [08-14|11:29:27.052] torrent verbosity                        level=WRN
[INFO] [08-14|11:29:27.053] [torrent] Public IP                      ip=167.235.115.91
[INFO] [08-14|11:29:27.055] Set global gas cap                       cap=50000000
[INFO] [08-14|11:29:27.057] [Downloader] Running with                ipv6-enabled=true ipv4-enabled=true download.rate=1000mb upload.rate=4mb
[INFO] [08-14|11:29:27.066] Opening Database                         label=chaindata path=/home/erigon/.local/share/erigon/chaindata
[INFO] [08-14|11:29:27.071] [db] open                                label=chaindata sizeLimit=15TB pageSize=16384
[WARN] [08-14|11:29:27.078] Sanitizing invalid bor miner gas price   provided=1000000000 updated=25000000000
[WARN] [08-14|11:29:27.078] Sanitizing invalid bor min fee cap       provided=30000000000 updated=25000000000.000
[INFO] [08-14|11:29:27.078] Initialised chain configuration          config="{ChainID: 137, Homestead: 0, DAO: <nil>, Tangerine Whistle: 0, Spurious Dragon: 0, Byzantium: 0, Constantinople: 0, Petersburg: 0, Istanbul: 3395000, Muir Glacier: 3395000, Berlin: 14750000, London: 23850000, Arrow Glacier: <nil>, Gray Glacier: <nil>, Terminal Total Difficulty: <nil>, Merge Netsplit: <nil>, Shanghai: <nil>, Cancun: <nil>, Prague: <nil>, Osaka: <nil>, Engine: bor}" genesis=0xa9c28ce2141b56c474f1dc504bee9b01eb1bd7d1a507580d5519d4437a97de1b

Steps to reproduce the behaviour

Run the polygon erigon 3 with the following deployment configuration:

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    pulumi.com/patchForce: "true"
  name: polygon-3-erigon3
  namespace: polygon
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: erigon3
      app.kubernetes.io/name: polygon-3-erigon3
  strategy:
    type: Recreate
  template:
    metadata:
      name: polygon-3
    spec:
      containers:
      - args:
        - --chain=bor-mainnet
        - --txpool.nolocals
        - --db.pagesize=16k
        - --private.api.addr=0.0.0.0:9090
        - --nat=extip:YOU_EXTERNAL_IP
        - --authrpc.vhosts=*
        - --authrpc.jwtsecret=/home/erigon/.local/share/erigon/jwtsecret
        - --authrpc.addr=0.0.0.0
        - --datadir=/home/erigon/.local/share/erigon
        - --db.size.limit=15TB
        - --metrics
        - --port=30843
        - --p2p.allowed-ports=30843
        - --p2p.allowed-ports=30844
        - --p2p.allowed-ports=30845
        - --metrics.addr=0.0.0.0
        - --metrics.port=6060
        - --torrent.download.rate=1000mb
        - --http.api=admin,net,eth,erigon,web3,net,debug,trace,txpool,engine,ots
        - --http.addr=0.0.0.0
        - --http.vhosts=*
        - --http.corsdomain=*
        - --rpc.batch.limit=1000
        - --bodies.cache=5G
        - --ws
        - --db.read.concurrency=1024
        - --rpc.batch.concurrency=64
        - --txpool.pricelimit=30000000000
        - --bor.milestone=false
        - --bor.heimdall=http://polygon-heimdall-rpc:1317/
        - --log.dir.path=/home/erigon/.local/share/erigon/logs
        command:
        - erigon
        image: docker.io/thorax/erigon:v3.0.0-alpha2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 10
          initialDelaySeconds: 60
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8545
          timeoutSeconds: 5
        name: polygon-3
        ports:
        - containerPort: 8545
          name: json-rpc
          protocol: TCP
        - containerPort: 42069
          name: snapsync
          protocol: TCP
        - containerPort: 42069
          name: snapsyncudp
          protocol: UDP
        - containerPort: 30843
          name: rlpx66
          protocol: TCP
        - containerPort: 30843
          name: discovery66
          protocol: UDP
        - containerPort: 30844
          name: rlpxunknown
          protocol: TCP
        - containerPort: 30844
          name: discoveryunkown
          protocol: UDP
        - containerPort: 30845
          name: rlpx67
          protocol: TCP
        - containerPort: 30845
          name: discovery67
          protocol: UDP
        - containerPort: 6060
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 12
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 8545
          timeoutSeconds: 5
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /home/erigon/.local/share/erigon
          name: storage
      dnsPolicy: ClusterFirst
      nodeSelector:
        kubernetes.io/hostname: public-nodes-10
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 90
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: polygon-3-erigon3

Change the YOU_EXTERNAL_IP to external IP address

Backtrace

[backtrace]
AskAlexSharov commented 1 month ago

it's a bit expected to have slow startup time on polygon. we will work on it in alpha4 you can try switch to latest main and add BT_M=32768 (but likely it will increase runtime read IO).