Closed ziouf closed 1 year ago
Hi @ziouf!
It happen on pod restart after upgrade to the latest operator helm chart.
Could you specify the exact version please?
panic: FATAL: cannot remove "/vm-data/indexdb/177E9C9207F871EC/1780E117A2091B9B": unlinkat /vm-data/indexdb/177E9C9207F871EC/1780E117A2091B9B/metadata.json: permission denied
What filesystem do you use? Is it NFS?
What filesystem do you use? Is it NFS?
I use the default storage class from my cloud provider (Scaleway). I don't know what is under the hood.
Here is the CRD value used for field vmcluster.spec.vmstorage
vmstorage:
replicaCount: 3
storageDataPath: "/vm-data"
storage:
volumeClaimTemplate:
spec:
storageClassName: scw-bssd
resources:
requests:
storage: 100Gi
extraArgs:
dedup.minScrapeInterval: 15s
search.maxConcurrentRequests: "16"
Could you specify the exact version please?
The helm chart version is helm.sh/chart: victoria-metrics-operator-0.26.1
The operator image deployed is victoriametrics/operator:v0.37.1
The VMStorage image is victoriametrics/vmstorage:v1.93.3-cluster
@hagen1778 This is most likely related to https://github.com/VictoriaMetrics/operator/releases/tag/v0.36.0
vmoperator parameters: Add option VM_ENABLESTRICTSECURITY and enable strict security context by default. See https://github.com/VictoriaMetrics/operator/issues/637, https://github.com/VictoriaMetrics/operator/pull/692/ and https://github.com/VictoriaMetrics/operator/pull/712 PR for details.
Maybe it is better to transfer this to operator issues?
@ziouf Could you try to add the following env variable to operator and see if that helps? (full list of supported variables can be found here)
VM_ENABLESTRICTSECURITY: false
According to changelog, the issue is caused by the following change:
vmoperator parameters: Add option VM_ENABLESTRICTSECURITY and enable strict security context by default. See https://github.com/VictoriaMetrics/operator/issues/637, https://github.com/VictoriaMetrics/operator/pull/692/ and https://github.com/VictoriaMetrics/operator/pull/712 PR for details.
Is this correct @zekker6 ?
Yes, it seems to me like it is.
@Amper @Haleygo I think this change should be a part of BreakingChanges section here https://github.com/VictoriaMetrics/operator/releases/tag/v0.36.0 wdyt?
@zekker6 I added the given variable and it seems to solve the issue VMStorage is starting well now. Thank you for your quick and effective support !
VM operator has EnableStrictSecurity=true
since v0.36.0, it will add below default securityContext to all the pods[vminsert/vmselect/vmstorage/alertmanger...]
securityContext:
// '65534' refers to 'nobody' in all the used default images like alpine, busybox.
fsGroup: 65534
fsGroupChangePolicy: OnRootMismatch
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
If component existed before operator upgrade and had no customized securityContext, volumes will be already mounted with root:root
. When operator got upgraded, statefulset will gain default securityContext, directory fsGroup will change correctly but not fsUser, then user 65534
won't be able to operate those directory[root:65534
] and fail the service.
Components which could encounter this problem:
Temporary fix:
EnableStrictSecurity
by adding the following env variable to operator
VM_ENABLESTRICTSECURITY: false
securityContext: {}
to override the default securityContext if no securityContext needed
vmstorage:
securityContext: {}
vmstorage:
initContainers:
- command: ["chown", "-R", "65534:65534", "/vm-data"]
image: busybox:latest
name: busybox
securityContext:
runAsNonRoot: false
runAsUser: 0
volumeMounts:
- mountPath: /vm-data
name: vmstorage-db
And Fix#1 will be included in next release of operator
Describe the bug
At startup, VMStorage crashes with the following message (full log bellow):
To Reproduce
I dont know. It happen on pod restart after upgrade to the latest operator helm chart. VMStorage is out-of-service even if I force to rollback to previous versions by forcing value of
spec.vmstorage.image.tag
field in VMCluster CRD.Version
vmstorage-20230902-002932-tags-v1.93.3-cluster-0-gf78d8b994d
Logs
Screenshots
No response
Used command-line flags
Additional information
No response