Open BlinkyStitt opened 10 months ago
Same here, running on Sepolia.
Erigon version: v2.55.0
StatefulSet manifest:
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/instance: ethereum
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: erigon
argocd.argoproj.io/instance: ethereum
helm.sh/chart: erigon-1.0.8
name: erigon
namespace: ethereum
spec:
persistentVolumeClaimRetentionPolicy:
whenDeleted: Retain
whenScaled: Retain
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/instance: ethereum
app.kubernetes.io/name: erigon
serviceName: erigon-headless
template:
metadata:
annotations:
checksum/secrets: 3b29556c4c07d2ac10020f254dab589e6e9c93c8618e7a311d0dcf28be2383e8
prometheus.io/path: /debug/metrics/prometheus
prometheus.io/port: "6061"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app.kubernetes.io/instance: ethereum
app.kubernetes.io/name: erigon
spec:
containers:
- command:
- sh
- -ac
- |
exec erigon --datadir=/data --nat=extip:$(POD_IP) --port=30303 --http=false --private.api.addr=127.0.0.1:9090 --authrpc.jwtsecret=/data/jwt.hex --authrpc.addr=0.0.0.0 --authrpc.port=8551 --authrpc.vhosts=* --metrics --metrics.addr=0.0.0.0 --metrics.port=6060 --chain=sepolia --internalcl --log.console.json=true --log.console.verbosity=info --log.dir.disable=true --maxpeers=200 --torrent.download.rate=1000mb --torrent.download.slots=100
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: xxx.dkr.ecr.us-east-1.amazonaws.com/erigon:v2.55.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 120
successThreshold: 1
tcpSocket:
port: metrics
timeoutSeconds: 1
name: erigon
ports:
- containerPort: 30303
name: p2p-tcp
protocol: TCP
- containerPort: 30303
name: p2p-udp
protocol: UDP
- containerPort: 8551
name: auth-rpc
protocol: TCP
- containerPort: 6060
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: metrics
timeoutSeconds: 1
resources:
limits:
cpu: "2"
memory: 10Gi
requests:
cpu: "1"
memory: 8Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: storage
- mountPath: /data/jwt.hex
name: jwt
readOnly: true
subPath: jwt.hex
- command:
- sh
- -ac
- |
while ! nc -z 127.0.0.1 9090; do sleep 1; done; exec rpcdaemon --datadir=/data --private.api.addr=127.0.0.1:9090 --txpool.api.addr=127.0.0.1:9090 --http.addr=0.0.0.0 --http.port=8545 --http.vhosts=* --metrics --metrics.addr=0.0.0.0 --metrics.port=6061 --http.api=eth,erigon,web3,net,debug,trace,txpool,db --ws
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: xxx.dkr.ecr.us-east-1.amazonaws.com/erigon:v2.55.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 60
periodSeconds: 120
successThreshold: 1
tcpSocket:
port: http-rpc
timeoutSeconds: 1
name: erigon-rpcd
ports:
- containerPort: 8545
name: http-rpc
protocol: TCP
- containerPort: 6061
name: metrics-rpcd
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
httpHeaders:
- name: Accept
value: application/json
- name: X-ERIGON-HEALTHCHECK
value: min_peer_count2
- name: X-ERIGON-HEALTHCHECK
value: synced
- name: X-ERIGON-HEALTHCHECK
value: max_seconds_behind60
path: /health
port: http-rpc
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: "2"
memory: 10Gi
requests:
cpu: "1"
memory: 8Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: storage
dnsPolicy: ClusterFirst
initContainers:
- command:
- chown
- -R
- 10001:10001
- /data
image: busybox:1.34.0
imagePullPolicy: IfNotPresent
name: init-chown-data
securityContext:
runAsNonRoot: false
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: storage
nodeSelector:
group: ethereum
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
serviceAccount: erigon
serviceAccountName: erigon
shareProcessNamespace: true
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: group
operator: Equal
value: ethereum
volumes:
- name: jwt
secret:
defaultMode: 420
secretName: erigon-jwt
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: storage
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Ti
storageClassName: fast-gp3
volumeMode: Filesystem
status:
phase: Pending
Latest block call:
$ curl -s -X POST --header 'Content-Type: application/json' localhost:8545 --data '{"jsonrpc":"2.0","method":"eth_getBlockByNumber","params":["latest", false],"id":1}' | jq .result.number | xargs printf "%d\n"
3999999
As of this comment, Sepolia is currently at: 4822615
Health check shows "HEALTHY":
$ curl -H "X-ERIGON-HEALTHCHECK: synced" -H "X-ERIGON-HEALTHCHECK: max_seconds_behind10" localhost:8545/health && echo
{"check_block":"DISABLED","max_seconds_behind":"HEALTHY","min_peer_count":"DISABLED","synced":"HEALTHY"}
Similar to https://github.com/ledgerwatch/erigon/issues/8752, the health check lies.
Here are logs showing my node is on step 7/15:
And here is curl of eth_syncing saying it isn't syncing:
And here is curl of /health saying it isn't syncing:
When I captured these logs, the server was ~8 days behind. So
max_seconds_behind60
definitely should be UNHEALTHY