asdf2014 / druid-helm

Helm Chart for Apache Druid
https://druid.apache.org
11 stars 8 forks source link

Failed to connect to S3 service endpoint #2

Closed ruiluis closed 7 months ago

ruiluis commented 8 months ago

I try to add S3 for storage and indexer instead of using local druid_extensions_loadList: '["druid-hdfs-storage", "druid-kafka-indexing-service","druid-histogram", "druid-datasketches","druid-s3-extensions", "druid-lookups-cached-global", "postgresql-metadata-storage"]'

druid_storage_type druid_storage_bucket druid_storage_baseKey druid_s3_accessKey druid_s3_secretKey druid_s3_endpoint_signingRegion

druid_indexer_logs_type druid_indexer_logs_s3Bucket druid_indexer_logs_s3Prefix

however for indexer logs midlemanager simply crash i cannot see any logs on the reason why.. if i run a micro standalone example it works fine

asdf2014 commented 8 months ago

@ruiluis Thank you for reporting this. First, we can rule out the issue of not adding the druid-s3-extensions, as we can see that the druid_extensions_loadList configuration is correct. It would be better to get the logs. Even if the MiddleManager pod crashes, there should be some logs available that can provide clues to what's happening. You can try to access the logs using Kubernetes commands. FYI,

$ kubectl logs mm_pod -n your_ns --previous

It can fetch the logs from the pod's last terminated container, which can be helpful when the container is crashing and being restarted by Kubernetes.

ruiluis commented 8 months ago

hi.. so i start debugging on this and other stuff.. 1- i was able to start logging.. needed to activate config vars DRUID_LOG_LEVEL: "debug" DRUID_LOG4J: 2- using postgresql the vars related to metadata storage are overridden by the one bellow related with postgresql however this cause an error in connecting to the db.. i had to change the template/configmap druid_metadata_storage_connector_connectURI: jdbc:postgresql://{{ .Release.Name }}-postgresql.dev.svc.cluster.local 3- related to the issue above the error is 2024-02-23T10:30:24,241 WARN [main] com.amazonaws.util.EC2MetadataUtils - Unable to retrieve the requested metadata (/latest/dynamic/instance-identity/document). Failed to connect to service endpoint: com.amazonaws.SdkClientException: Failed to connect to service endpoint: .. com.google.inject.ProvisionException: Unable to provision, see the following errors:

Error in custom provider, com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region. at org.apache.druid.storage.s3.S3StorageDruidModule.getAmazonS3Client(S3StorageDruidModule.java:170) (via modules: com.google.inject.util.Modules$OverrideModule -> org.apache.druid.storage.s3.S3StorageDruidModule)

however i configured the region to

2024-02-23T10:30:15,454 INFO [main] org.apache.druid.cli.CliMiddleManager - druid.s3.accessKey: DADDAA 2024-02-23T10:30:15,454 INFO [main] org.apache.druid.cli.CliMiddleManager - druid.s3.endpoint.signingRegion: eu-central-1 2024-02-23T10:30:15,454 INFO [main] org.apache.druid.cli.CliMiddleManager - * druid.s3.secretKey: masked

4- even so i removed the s3 and continue with local.. but it then appear other error.. i was able to connect to the kafka stream see the data in druid to define the ingestion task.. however when i try to add it:

Failed to submit task: Please make sure to load all the necessary extensions and jars with type 'kafka' on 'druid/coordinator' service. Could not resolve type id 'kafka' as a subtype of org.apache.druid.indexing.common.task.Task

i was able to check the status of the coordinator and it has kafka in there

common","version":"24.0.0"},{"name":"org.apache.druid.storage.hdfs.HdfsStorageDruidModule","artifact":"druid-hdfs-storage","version":"24.0.0"},{"name":"org.apache.druid.indexing.kafka.KafkaIndexTaskModule","artifact":"druid-kafka-indexing-service","version":"24.0.0"},

ruiluis commented 8 months ago

Update on this... i was not able to insert the "ingestion task" using web interface json insertion tab.. however now it works if i do it using the ingestion flow sequence in the web page.. however is still cannot use S3

asdf2014 commented 8 months ago

@ruiluis Thank you for providing these information. I am glad to hear that you have resolved the issues other than the one with S3. Regarding S3, I noticed the log you provided, com.amazonaws.SdkClientException: Failed to connect to service endpoint, which seems like a network issue with the S3 endpoint. I suggest you check if the network env is good, whether the network firewall is disabled, and if the endpoint configuration is correct, and etc. Looking forward to your good news.

ruiluis commented 8 months ago

hi.. sorry but this is not the case.. i try another pod in the node and it has connectivity to the world.. so the issue is not in to have or not connection to the outside.. the issue must be related to something else.. yes there is this error in the beginning: 2024-02-26T15:49:23,788 WARN [main] org.apache.druid.indexing.common.config.TaskConfig - Batch processing mode argument value is null or not valid:[null], defaulting to[CLOSED_SEGMENTS] 2024-02-26T15:49:32,676 WARN [main] com.amazonaws.util.EC2MetadataUtils - Unable to retrieve the requested metadata (/latest/dynamic/instance-identity/document). Failed to connect to service endpoint: com.amazonaws.SdkClientException: Failed to connect to service endpoint:

but there is this also: Caused by: com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region. at com.amazonaws.client.builder.AwsClientBuilder.setRegion(AwsClientBuilder.java:462) ~[aws-java-sdk-core-1.12.638.jar:?]

can this be caused by some configuration missing on the middlemanager, or some package not installed? or as i already catch.. java issues? between openjdk or jre.. this is my conf and its working in the same machine but outside the cluster..

druid_storage_type: s3 druid_storage_bucket: druid_storage_baseKey: druid_s3_accessKey: druid_s3_secretKey: druid_s3_endpoint_signingRegion: eu-central-1

asdf2014 commented 8 months ago

@ruiluis Thank you for helping to rule out the network issue. Based on the new log you provided, Caused by: com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region., it seems likely that the issue is caused by a misconfiguration related to S3. I noticed that you only configured the druid_s3_endpoint_signingRegion parameter and did not set the druid_s3_endpoint_url parameter, which could be the reason for this error.

ruiluis commented 8 months ago

hi. finally was able to pinpoint the problem.. already fixed for ingestion tasks.. however for indexing not yet.. the issue is that aws is buggy when getting aws java options.. so the aws.region has to be passed in java option when launching it .. to resolve for indes tasking is easy.. just add ,"-Daws.region=eu-central-1" in druid_indexer_runner_javaOptsArray for the middle manager.. However for indexing i need to add this to "start command" of the middle manager and coordinator.. How can i do that?

asdf2014 commented 8 months ago

@ruiluis That is great news! Finally seeing the finish line, if you want to add this javaOptsArray parameter, you can do so by modifying the ConfigMap to make this parameter take effect globally.

ruiluis commented 8 months ago

how can i do that?.. what i need in fact is add a line (-Daws.region=eu-central-1) to the jvm.config file.. of the middlemanager and broker

asdf2014 commented 8 months ago

@ruiluis Here is the command to modify the ConfigMap:

$ kubectl edit configmap your_configmap_name -n your_ns

And then you can add druid_indexer_runner_javaOptsArray option there.

ruiluis commented 8 months ago

the indexer part i already fixed.. that is not the issue.. i need is to add the same line to the jvm.config file so that when the middlemanager start it will add that to the java executable, basically i need something like DRUID_XMX to allow me to add a line to the jvm.config

asdf2014 commented 8 months ago

@ruiluis I would recommend that you can try MoK mode without MM anymore since you are deploying Druid cluster on Kubernetes with Helm Chart :sweat_smile:

asdf2014 commented 7 months ago

@ruiluis I assume you have resloved this problem since you have not responded for a long time, closing this now and you are welcome to reopen it if needed.