Closed milexjaro closed 7 months ago
Hi @milexjaro, thanks for reporting this, it might be issues related to the specific storage driver on GKE or the configuration of persistent volumes that do not support file locking
Hi, @milexjaro, can share your helm Value.yaml setting as well
thanks for the prompt response @asdf2014,
I assume that issue lies within the persistent volume of the middle manager (MM), no? as the peon process is spawned by the MM
is there any possibility the issue came from using the google cloud storage as the deep storage as well?
here's the full values.yaml for more context:
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Default values for druid.
image:
repository: apache/druid
tag: 29.0.1
pullPolicy: IfNotPresent
pullSecrets: []
configMap:
## If false, configMap will not be applied
##
enabled: true
# Required if using kubernetes extensions which modify resources like 'druid-kubernetes-extensions' or 'druid-kubernetes-overlord-extensions'
rbac:
create: true
## Define the key value pairs in the configmap
configVars:
## DRUID env vars. ref: https://github.com/apache/druid/blob/master/distribution/docker/druid.sh#L29
# DRUID_LOG_LEVEL: "warn"
DRUID_LOG4J: <?xml version="1.0" encoding="UTF-8" ?><Configuration status="WARN"><Appenders><Console name="Console" target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG"><AppenderRef ref="Console"/></Logger></Loggers></Configuration>
DRUID_USE_CONTAINER_IP: "true"
DRUID_LOG_LEVEL: "debug"
DRUID_SERVICE_LOG4J: <?xml version="1.0" encoding="UTF-8" ?><Configuration status="WARN"><Appenders><Console name="Console" target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG"><AppenderRef ref="Console"/></Logger></Loggers></Configuration>
# DRUID_SERVICE_LOG_LEVEL: "debug"
## Druid Common Configurations. ref: https://druid.apache.org/docs/latest/configuration/index.html#common-configurations
druid_extensions_loadList: '["druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-google-extensions", "druid-avro-extensions", "druid-parquet-extensions", "druid-basic-security"]'
druid_metadata_storage_type: postgresql
druid_storage_type: google
druid_google_prefix: "druid/deep-storage"
druid_indexer_logs_type: google
druid_indexer_logs_prefix: "druid/indexer-logs"
## Druid Emitting Metrics. ref: https://druid.apache.org/docs/latest/configuration/index.html#emitting-metrics
druid_emitter: logging
druid_emitter_logging_logLevel: debug
druid_emitter_http_recipientBaseUrl: http://druid_exporter_url:druid_exporter_port/druid
druid_metadata_storage_connector_connectURI: jdbc:postgresql://XXXXX:5432/druid
druid_google_bucket: "XXXXXXX"
druid_indexer_logs_bucket: "XXXXXXX"
druid_metadata_storage_connector_user: "druid"
druid_metadata_storage_connector_password: "XXXXXXX"
# Druid basic security
druid_auth_authenticatorChain: '["MyBasicMetadataAuthenticator"]'
druid_auth_authenticator_MyBasicMetadataAuthenticator_type: basic
# Default password for 'admin' user, should be changed for production_
druid_auth_authenticator_MyBasicMetadataAuthenticator_initialAdminPassword: XXXXX
# Default password for internal 'druid_system' user, should be changed for production_
druid_auth_authenticator_MyBasicMetadataAuthenticator_initialInternalClientPassword: XXXXX
# Uses the metadata store for storing users_
# You can use the authentication API to create new users and grant permissions
druid_auth_authenticator_MyBasicMetadataAuthenticator_credentialsValidator_type: metadata
# If true and if the request credential doesn't exist in this credentials store,
# the request will proceed to next Authenticator in the chain_
druid_auth_authenticator_MyBasicMetadataAuthenticator_skipOnFailure: 'false'
druid_auth_authenticator_MyBasicMetadataAuthenticator_authorizerName: MyBasicMetadataAuthorizer
# Escalator
druid_escalator_type: basic
druid_escalator_internalClientUsername: druid_system
druid_escalator_internalClientPassword: XXXXX
druid_escalator_authorizerName: MyBasicMetadataAuthorizer
druid_auth_authorizers: '["MyBasicMetadataAuthorizer"]'
druid_auth_authorizer_MyBasicMetadataAuthorizer_type: basic
gCloudStorage:
enabled: true
secretName: druid-gcs-sa-key
broker:
## If false, broker will not be installed
##
enabled: true
name: broker
replicaCount: 1
port: 8082
serviceType: ClusterIP
config:
DRUID_XMX: 512m
DRUID_XMS: 512m
DRUID_MAXDIRECTMEMORYSIZE: 400m
druid_processing_buffer_sizeBytes: '50000000'
druid_processing_numMergeBuffers: 2
druid_processing_numThreads: 1
# druid_monitoring_monitors: '["org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]'
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources: {}
# limits:
# cpu: 1
# memory: 1Gi
# requests:
# cpu: 250m
# memory: 512Mi
serviceAccount:
# -- Create a service account for the broker
create: true
# -- Service Account name
name:
# -- Annotations applied to created service account
annotations: {}
# -- Labels applied to created service account
labels: {}
# -- Automount API credentials for the service account
automountServiceAccountToken: true
nodeSelector:
cloud.google.com/gke-nodepool: XXXXX
tolerations:
- key: "XXXXX"
operator: "Equal"
value: "false"
effect: "NoExecute"
affinity: {}
podAnnotations: {}
# https://github.com/apache/druid/issues/11118
coordinator:
## If false, coordinator will not be installed
##
enabled: true
name: coordinator
replicaCount: 1
port: 8081
serviceType: ClusterIP
config:
DRUID_XMX: 256m
DRUID_XMS: 256m
# druid_coordinator_asOverlord_enabled: 'false'
# druid_coordinator_asOverlord_overlordService: 'druid/overlord'
# druid_indexer_runner_type: 'remote'
# druid_indexer_storage_type: 'metadata'
# druid_coordinator_balancer_strategy: 'cachingCost'
# druid_monitoring_monitors: '["org.apache.druid.server.metrics.TaskCountStatsMonitor"]'
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources: {}
# limits:
# cpu: 500m
# memory: 1Gi
# requests:
# cpu: 250m
# memory: 512Mi
serviceAccount:
# -- Create a service account for the coordinator
create: true
# -- Service Account name
name:
# -- Annotations applied to created service account
annotations: {}
# -- Labels applied to created service account
labels: {}
# -- Automount API credentials for the service account
automountServiceAccountToken: true
nodeSelector:
cloud.google.com/gke-nodepool: XXXXX
tolerations:
- key: "XXXXX"
operator: "Equal"
value: "false"
effect: "NoExecute"
affinity: {}
podAnnotations: {}
overlord:
## If true, the separate overlord will be installed
##
enabled: false
name: overlord
replicaCount: 1
port: 8081
serviceType: ClusterIP
config:
druid_indexer_tasklock_forceTimeChunkLock: 'false'
# druid_indexer_runner_type: 'httpRemote'
# druid_indexer_storage_type: 'metadata'
javaOpts: "-Xms1G -Xmx1G"
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources: {}
serviceAccount:
# -- Create a service account for the overlord
create: true
# -- Service Account name
name:
# -- Annotations applied to created service account
annotations: {}
# -- Labels applied to created service account
labels: {}
# -- Automount API credentials for the service account
automountServiceAccountToken: true
nodeSelector:
cloud.google.com/gke-nodepool: XXXXX
tolerations:
- key: "XXXXX"
operator: "Equal"
value: "false"
effect: "NoExecute"
affinity: {}
podAnnotations: {}
historical:
## If false, historical will not be installed
##
enabled: true
name: historical
replicaCount: 1
port: 8083
serviceType: ClusterIP
config:
DRUID_XMX: 512m
DRUID_XMS: 512m
DRUID_MAXDIRECTMEMORYSIZE: 400m
druid_processing_buffer_sizeBytes: '50000000'
druid_processing_numMergeBuffers: 2
druid_processing_numThreads: 1
# druid_monitoring_monitors: '["org.apache.druid.client.cache.CacheMonitor", "org.apache.druid.server.metrics.HistoricalMetricsMonitor", "org.apache.druid.server.metrics.QueryCountStatsMonitor"]'
# druid_segmentCache_locations: '[{"path":"/opt/druid/var/druid/segment-cache","maxSize":300000000000}]'
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
persistence:
enabled: true
accessMode: ReadWriteOnce
size: "4Gi"
# storageClass: "ssd"
antiAffinity: "soft"
nodeAffinity: {}
nodeSelector:
cloud.google.com/gke-nodepool: XXXXX
tolerations:
- key: "XXXXX"
operator: "Equal"
value: "false"
effect: "NoExecute"
securityContext:
fsGroup: 1000
resources: {}
# limits:
# cpu: 2
# memory: 2Gi
# requests:
# cpu: 500m
# memory: 512Mi
serviceAccount:
# -- Create a service account for the overlord
create: true
# -- Service Account name
name:
# -- Annotations applied to created service account
annotations: {}
# -- Labels applied to created service account
labels: {}
# -- Automount API credentials for the service account
automountServiceAccountToken: true
livenessProbeInitialDelaySeconds: 60
readinessProbeInitialDelaySeconds: 60
## (dict) If specified, apply these annotations to each master Pod
podAnnotations: {}
podDisruptionBudget:
enabled: false
# minAvailable: 2
maxUnavailable: 1
updateStrategy:
type: RollingUpdate
middleManager:
## If false, middleManager will not be installed
##
enabled: true
name: middle-manager
replicaCount: 1
port: 8091
serviceType: ClusterIP
config:
DRUID_XMX: 4096m
DRUID_XMS: 256m
druid_indexer_runner_javaOptsArray: '["-server", "-Xms256m", "-Xmx4096m", "-XX:MaxDirectMemorySize=4g", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-XX:+ExitOnOutOfMemoryError", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]'
druid_indexer_fork_property_druid_processing_buffer_sizeBytes: '25000000'
autoscaling:
enabled: false
minReplicas: 2
maxReplicas: 5
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 60
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
persistence:
enabled: true
accessMode: ReadWriteOnce
size: "4Gi"
# storageClass: "ssd"
antiAffinity: "soft"
nodeAffinity: {}
nodeSelector:
cloud.google.com/gke-nodepool: XXXXX
tolerations:
- key: "XXXXX"
operator: "Equal"
value: "false"
effect: "NoExecute"
securityContext:
fsGroup: 1000
resources:
limits:
cpu: 1000m
memory: 5Gi
requests:
cpu: 250m
memory: 256Mi
serviceAccount:
# -- Create a service account for the middleManager
create: true
# -- Service Account name
name:
# -- Annotations applied to created service account
annotations: {}
# -- Labels applied to created service account
labels: {}
# -- Automount API credentials for the service account
automountServiceAccountToken: true
## (dict) If specified, apply these annotations to each master Pod
podAnnotations: {}
podDisruptionBudget:
enabled: false
# minAvailable: 2
maxUnavailable: 1
updateStrategy:
type: RollingUpdate
router:
## If false, router will not be installed
##
enabled: true
name: router
replicaCount: 1
port: 8888
serviceType: ClusterIP
config:
DRUID_XMX: 128m
DRUID_XMS: 128m
DRUID_MAXDIRECTMEMORYSIZE: 128m
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources: {}
# limits:
# cpu: 250m
# memory: 256Mi
# requests:
# cpu: 100m
# memory: 128Mi
serviceAccount:
# -- Create a service account for the router
create: true
# -- Service Account name
name:
# -- Annotations applied to created service account
annotations: {}
# -- Labels applied to created service account
labels: {}
# -- Automount API credentials for the service account
automountServiceAccountToken: true
nodeSelector:
cloud.google.com/gke-nodepool: XXXXX
tolerations:
- key: "XXXXX"
operator: "Equal"
value: "false"
effect: "NoExecute"
affinity: {}
podAnnotations: {}
# ------------------------------------------------------------------------------
# ZooKeeper:
# ------------------------------------------------------------------------------
# If using a ZooKeeper installed outside of this chart you must uncomment and set this line
# zkHosts: druid-zookeeper-headless:2181
zookeeper:
enabled: true
## Environmental variables to set in ZooKeeper
##
env:
## The JVM heap size to allocate to ZooKeeper
ZK_HEAP_SIZE: "512M"
## Configure ZooKeeper headless
headless:
publishNotReadyAddresses: true
nodeSelector:
cloud.google.com/gke-nodepool: XXXXX
tolerations:
- key: "XXXXX"
operator: "Equal"
value: "false"
effect: "NoExecute"
# ------------------------------------------------------------------------------
# MySQL:
# ------------------------------------------------------------------------------
mysql:
enabled: false
# ------------------------------------------------------------------------------
# PostgreSQL:
# ------------------------------------------------------------------------------
postgresql:
enabled: false
# Secrets
prometheus:
enabled: false
# pick the any port what you want
port: 9090
annotation:
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
google:
gcsAPIKey: XXXXXXXXXXXX
Hi, @milexjaro , Any possibility that your GCS is full? https://stackoverflow.com/questions/27969511/gsutil-no-locks-available
Hi @fectrain , I tried to replicate the step in your reference and I can still use the gsutil ls
on the bucket + prefix. Also, it seems that the middle manager file system is still healthy. Kinda curious though, if we use the GCS as the deep storage, is it also expected to be mounted in the Druid components (i.e. middleManager)?
if we use the GCS as the deep storage, is it also expected to be mounted in the Druid components (i.e. middleManager)?
No need to mount. BTW, Are you using NFS as GKE persistent storage?
yes @fectrain, thanks for the big hint! I noticed that there were similar issues on the internet, one of discussions suggests to restart the NFS server, and it works like a charm! reference: https://serverfault.com/a/1101161
thanks for your support guys 🙌
Hi, I just released a fresh helm deployment using Druid 29.0.1 on Google Kubernetes Engine (GKE), using external Postgre for metadata and Google Cloud Storage (GCS) for deep storage and indexer logs. When I tried to load a parquet files from GCS, the task failed when attempting to lock a file and there was no locks available (Caused by: java.lang.RuntimeException: java.io.IOException: No locks available). Do you have any idea why it happened? Thank you
Below is the payload and the logs
payload:
logs: