druid-io / druid-operator

Druid Kubernetes Operator
Other
205 stars 93 forks source link

Druid cluster with deep storage "local" (PVC) #277

Open kamolhasan opened 2 years ago

kamolhasan commented 2 years ago

Questions::

Issue: I have created a druid cluster following the https://github.com/druid-io/druid-operator/blob/master/examples/tiny-cluster.yaml YAML.

Cluster is up and running:

$ kubectl get all,pvc -n druid 
NAME                                    READY   STATUS    RESTARTS   AGE
pod/druid-tiny-cluster-brokers-0        1/1     Running   0          112m
pod/druid-tiny-cluster-coordinators-0   1/1     Running   0          112m
pod/druid-tiny-cluster-historicals-0    1/1     Running   0          112m
pod/druid-tiny-cluster-routers-0        1/1     Running   0          112m
pod/zk-0                                1/1     Running   0          47h

NAME                                      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                      AGE
service/druid-tiny-cluster-brokers        ClusterIP   None         <none>        8088/TCP                     112m
service/druid-tiny-cluster-coordinators   ClusterIP   None         <none>        8088/TCP                     112m
service/druid-tiny-cluster-historicals    ClusterIP   None         <none>        8088/TCP                     112m
service/druid-tiny-cluster-routers        ClusterIP   None         <none>        8088/TCP                     112m
service/zk                                ClusterIP   None         <none>        2181/TCP,2888/TCP,3888/TCP   47h

NAME                                               READY   AGE
statefulset.apps/druid-tiny-cluster-brokers        1/1     112m
statefulset.apps/druid-tiny-cluster-coordinators   1/1     112m
statefulset.apps/druid-tiny-cluster-historicals    1/1     112m
statefulset.apps/druid-tiny-cluster-routers        1/1     112m
statefulset.apps/zk                                1/1     47h

NAME                                                                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
persistentvolumeclaim/deepstorage-volume-druid-tiny-cluster-brokers-0        Bound    pvc-43539464-2e76-4360-9db8-7ed944bae513   4Gi        RWO            do-block-storage   112m
persistentvolumeclaim/deepstorage-volume-druid-tiny-cluster-coordinators-0   Bound    pvc-0767d00f-e7ed-4324-9ec8-1ea019a6ddac   4Gi        RWO            do-block-storage   112m
persistentvolumeclaim/deepstorage-volume-druid-tiny-cluster-historicals-0    Bound    pvc-9063ed91-4080-4364-a7cc-0c0262b4b97c   4Gi        RWO            do-block-storage   112m
persistentvolumeclaim/deepstorage-volume-druid-tiny-cluster-routers-0        Bound    pvc-f5f23cf1-5a6c-4189-b97a-9b1a6a7c1401   4Gi        RWO            do-block-storage   112m

But the problem we faced after creating some tasks: Logs pod/druid-tiny-cluster-historicals-0:

2022-03-09T16:48:54,161 INFO [ZkCoordinator] org.apache.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/10.244.2.158:8088/subscription-analytics.1.2.invoice_line_item_2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z_2022-03-09T16:42:55.363Z] was removed
2022-03-09T16:48:54,161 INFO [ZKCoordinator--0] org.apache.druid.server.coordination.ZkCoordinator - Completed request [LOAD: subscription-analytics.1.2.invoice_line_item_2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z_2022-03-09T16:42:55.363Z]
2022-03-09T16:48:54,161 INFO [ZKCoordinator--0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment subscription-analytics.1.2.invoice_2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z_2022-03-09T16:42:34.582Z
2022-03-09T16:48:54,162 WARN [ZKCoordinator--0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - No path to unannounce segment[subscription-analytics.1.2.invoice_2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z_2022-03-09T16:42:34.582Z]
2022-03-09T16:48:54,162 INFO [ZKCoordinator--0] org.apache.druid.server.SegmentManager - Told to delete a queryable for a dataSource[subscription-analytics.1.2.invoice] that doesn't exist.
2022-03-09T16:48:54,162 WARN [ZKCoordinator--0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - [/druid/data/segments/subscription-analytics.1.2.invoice/2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z/2022-03-09T16:42:34.582Z/0] may be damaged. Delete all the segment files and pull from DeepStorage again.
2022-03-09T16:48:54,162 INFO [ZKCoordinator--0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/druid/data/segments/subscription-analytics.1.2.invoice/2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z/2022-03-09T16:42:34.582Z/0]
2022-03-09T16:48:54,162 INFO [ZKCoordinator--0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/druid/data/segments/subscription-analytics.1.2.invoice/2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z/2022-03-09T16:42:34.582Z]
2022-03-09T16:48:54,162 INFO [ZKCoordinator--0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/druid/data/segments/subscription-analytics.1.2.invoice/2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z]
2022-03-09T16:48:54,162 INFO [ZKCoordinator--0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[/druid/data/segments/subscription-analytics.1.2.invoice]
2022-03-09T16:48:54,162 WARN [ZKCoordinator--0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Asked to cleanup something[subscription-analytics.1.2.invoice_2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z_2022-03-09T16:42:34.582Z] that didn't exist.  Skipping.
2022-03-09T16:48:54,162 WARN [ZKCoordinator--0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Unable to delete segmentInfoCacheFile[/druid/data/segments/info_dir/subscription-analytics.1.2.invoice_2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z_2022-03-09T16:42:34.582Z]
2022-03-09T16:48:54,163 ERROR [ZKCoordinator--0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Failed to load segment for dataSource: {class=org.apache.druid.server.coordination.SegmentLoadDropHandler, exceptionType=class org.apache.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[subscription-analytics.1.2.invoice_2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z_2022-03-09T16:42:34.582Z], segment=DataSegment{binaryVersion=9, id=subscription-analytics.1.2.invoice_2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z_2022-03-09T16:42:34.582Z, loadSpec={type=>local, path=>/druid/deepstorage/subscription-analytics.1.2.invoice/2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z/2022-03-09T16:42:34.582Z/0/index.zip}, dimensions=[id, customer_id, due_date], metrics=[], shardSpec=NumberedShardSpec{partitionNum=0, partitions=1}, lastCompactionState=null, size=1480}}
org.apache.druid.segment.loading.SegmentLoadingException: Exception loading segment[subscription-analytics.1.2.invoice_2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z_2022-03-09T16:42:34.582Z]
    at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:276) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:320) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:61) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.coordination.ZkCoordinator.lambda$childAdded$2(ZkCoordinator.java:150) ~[druid-server-0.21.1.jar:0.21.1]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_275]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_275]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]
Caused by: java.lang.IllegalArgumentException: Cannot construct instance of `org.apache.druid.segment.loading.LocalLoadSpec`, problem: [/druid/deepstorage/subscription-analytics.1.2.invoice/2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z/2022-03-09T16:42:34.582Z/0/index.zip] does not exist
 at [Source: UNKNOWN; line: -1, column: -1]
    at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:3938) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:3869) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocation(SegmentLoaderLocalCacheManager.java:303) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocationWithStartMarker(SegmentLoaderLocalCacheManager.java:292) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadSegmentWithRetry(SegmentLoaderLocalCacheManager.java:253) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:225) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:186) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.SegmentManager.getAdapter(SegmentManager.java:278) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:224) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:272) ~[druid-server-0.21.1.jar:0.21.1]
    ... 8 more
Caused by: com.fasterxml.jackson.databind.exc.ValueInstantiationException: Cannot construct instance of `org.apache.druid.segment.loading.LocalLoadSpec`, problem: [/druid/deepstorage/subscription-analytics.1.2.invoice/2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z/2022-03-09T16:42:34.582Z/0/index.zip] does not exist
 at [Source: UNKNOWN; line: -1, column: -1]
    at com.fasterxml.jackson.databind.exc.ValueInstantiationException.from(ValueInstantiationException.java:47) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.DeserializationContext.instantiationException(DeserializationContext.java:1735) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException(StdValueInstantiator.java:491) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem(StdValueInstantiator.java:514) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:285) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.ValueInstantiator.createFromObjectWith(ValueInstantiator.java:229) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:198) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:488) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1292) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:326) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:194) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:161) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:130) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:97) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:254) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:68) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:3933) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:3869) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocation(SegmentLoaderLocalCacheManager.java:303) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocationWithStartMarker(SegmentLoaderLocalCacheManager.java:292) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadSegmentWithRetry(SegmentLoaderLocalCacheManager.java:253) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:225) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:186) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.SegmentManager.getAdapter(SegmentManager.java:278) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:224) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:272) ~[druid-server-0.21.1.jar:0.21.1]
    ... 8 more
Caused by: java.lang.IllegalArgumentException: [/druid/deepstorage/subscription-analytics.1.2.invoice/2020-06-14T00:00:00.000Z_2020-06-15T00:00:00.000Z/2022-03-09T16:42:34.582Z/0/index.zip] does not exist
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:148) ~[guava-16.0.1.jar:?]
    at org.apache.druid.segment.loading.LocalLoadSpec.<init>(LocalLoadSpec.java:51) ~[druid-server-0.21.1.jar:0.21.1]
    at sun.reflect.GeneratedConstructorAccessor55.newInstance(Unknown Source) ~[?:?]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_275]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_275]
    at com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:124) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:283) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.ValueInstantiator.createFromObjectWith(ValueInstantiator.java:229) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:198) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:488) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1292) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:326) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:194) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:161) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:130) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:97) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:254) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:68) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:3933) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:3869) ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocation(SegmentLoaderLocalCacheManager.java:303) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadInLocationWithStartMarker(SegmentLoaderLocalCacheManager.java:292) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.loadSegmentWithRetry(SegmentLoaderLocalCacheManager.java:253) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:225) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:186) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.SegmentManager.getAdapter(SegmentManager.java:278) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:224) ~[druid-server-0.21.1.jar:0.21.1]
    at org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:272) ~[druid-server-0.21.1.jar:0.21.1]
    ... 8 more

While debugging the error, I found people are recommending to use s3 or hdfs instead of local as deep storage. Some claimed that local doesn't work in clustering mode.

What am I doing wrong? Isn't https://github.com/druid-io/druid-operator/blob/master/examples/tiny-cluster.yaml a valid YAML?

joehacksalot commented 2 years ago

We are seeing the same issue. Looks like its been a known problem with the helm charts for some time with no real fix in the works.

https://github.com/apache/druid/issues/10523

yangyu66 commented 1 year ago

have same question, any updates?

yvesblt commented 1 year ago

We are seeing the same issue. Looks like its been a known problem with the helm charts for some time with no real fix in the works.

apache/druid#10523

Hi, Thank you for this thread. Does it means that it is not possible to use druid_storage_type: local with Druid Helm chart ? Even for a single node installation ?

Thank you,