Including SessionToken in url with v2 driver causes SQLException: IAM error retrieving temp credentials

Y-Asahi-dev commented 1 year ago

Driver version

2.1.0.9

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.54052

Client Operating System

Amazon EMR ver6.9

※OS info NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/"

JAVA/JVM version

openjdk version "1.8.0_382" OpenJDK Runtime Environment Corretto-8.382.05.1 (build 1.8.0_382-b05) OpenJDK 64-Bit Server VM Corretto-8.382.05.1 (build 25.382-b05, mixed mode)

Table schema

Problem description

I am using amazon-redshift-jdbc-driver v2 with Pyspark (Spark version 3.3.2). I get a SQLException when I run the code below. It seems that an error occurs if the URL has a SessionToken parameter. After replacing the jdbc-driver with v1 I get ret.count() results successfully without any errors.

Has the behavior changed between v1 and v2 when there is a SessionToken?

//----------------------------------------------------------- from pyspark import SparkContext from pyspark.sql import SQLContext import boto3

redshift_cluster_id = 'sample_cluster' redshift_dbname = 'sample_db' bucket_name = 'sample_bucket'

sc = SparkContext.getOrCreate() credentials = boto3.Session().get_credentials() region = boto3.Session().region_name sc._jsc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", credentials.access_key) sc._jsc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", credentials.secret_key)

url = f'jdbc:redshift:iam://{redshift_cluster_id}:{region}/{redshift_dbname}?DbUser=test_uesr&DbGroups=test_users_group&AutoCreate=true&AccessKeyID={credentials.access_key}&SecretAccessKey={credentials.secret_key}&user=&password=' token = credentials.token sc._jsc.hadoopConfiguration().set("fs.s3.awsSessionToken", token) url = f'{url}&SessionToken={credentials.token}'

sql_context = SQLContext(sc) dfr = sql_context.read \ .format('io.github.spark_redshift_community.spark.redshift') \ .option('url', url) \ .option('tempdir', f's3a://{bucket_name}/redshift_temp_dir') \ .option('forward_spark_s3_credentials', 'true') \ .option('fetchsize', 10000)

query = f'select * from sample_table' ret = dfr.option('query', query).load(schema=None) ret.count() -----------------------------------------------------------//

Expected behaviour:
Actual behaviour:
Error message/stack trace: : java.sql.SQLException: IAM error retrieving temp credentials: The security token included in the request is invalid (Service: AmazonRedshift; Status Code: 403; Error Code: InvalidClientTokenId; Request ID: 78fc4d63-34c5-45a1-af4f-454c122d93e5; Proxy: null) at com.amazon.redshift.util.RedshiftException.getSQLException(RedshiftException.java:56) at com.amazon.redshift.Driver.connect(Driver.java:339) at org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:46) at io.github.spark_redshift_community.spark.redshift.JDBCWrapper.getConnector(RedshiftJDBCWrapper.scala:270) at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.$anonfun$schema$1(RedshiftRelation.scala:69) at scala.Option.getOrElse(Option.scala:189) at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.schema$lzycompute(RedshiftRelation.scala:66) at io.github.spark_redshift_community.spark.redshift.RedshiftRelation.schema(RedshiftRelation.scala:65) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:440) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.lang.Thread.run(Thread.java:750) Caused by: com.amazonaws.services.redshift.model.AmazonRedshiftException: The security token included in the request is invalid (Service: AmazonRedshift; Status Code: 403; Error Code: InvalidClientTokenId; Request ID: 78fc4d63-34c5-45a1-af4f-454c122d93e5; Proxy: null) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1879) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1418) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1387) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755) at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715) at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541) at com.amazonaws.services.redshift.AmazonRedshiftClient.doInvoke(AmazonRedshiftClient.java:9234) at com.amazonaws.services.redshift.AmazonRedshiftClient.invoke(AmazonRedshiftClient.java:9201) at com.amazonaws.services.redshift.AmazonRedshiftClient.invoke(AmazonRedshiftClient.java:9190) at com.amazonaws.services.redshift.AmazonRedshiftClient.executeDescribeClusters(AmazonRedshiftClient.java:4581) at com.amazonaws.services.redshift.AmazonRedshiftClient.describeClusters(AmazonRedshiftClient.java:4549) at com.amazon.redshift.core.IamHelper.setClusterCredentials(IamHelper.java:546) at com.amazon.redshift.core.IamHelper.setIAMCredentials(IamHelper.java:518) at com.amazon.redshift.core.IamHelper.setIAMProperties(IamHelper.java:279) at com.amazon.redshift.jdbc.RedshiftConnectionImpl.(RedshiftConnectionImpl.java:260) at com.amazon.redshift.Driver.makeConnection(Driver.java:502) at com.amazon.redshift.Driver.connect(Driver.java:315) ... 24 more
Any other details that can be helpful:

JDBC trace logs

Reproduction code

bhvkshah commented 1 year ago

Hi @Y-Asahi-dev , thanks for reporting this issue. Can you try: 1/ with the latest driver, version 2.1.0.18 2/ passing the session token as part of a Property object, instead of as part of the URL?

Y-Asahi-dev commented 1 year ago

@bhvkshah

1/ with the latest driver, version 2.1.0.18

There was no change in the results.

2/ passing the session token as part of a Property object, instead of as part of the URL?

https://docs.aws.amazon.com/ja_jp/redshift/latest/mgmt/spark-redshift-connector-other-config.html

Remove the IAM parameters from the URL and

temporary_aws_access_key_id temporary_aws_secret_access_key temporary_aws_session_token

When I specified it in the config setting value, the process was successful. But this config can be processed normally even if the value is empty, so it probably doesn't work. By removing the IAM parameter from the URL, it seems that Redshift is only auto-loading a valid IAM user.

thanks.

ravi-Mode commented 2 months ago

Hi @bhvkshah I am also trying to use the iam role base authetication with assume role api using the jdbc driver version 2.1.0.30. Here I am observing strange behavior, I am able to generate acceskey, secerectAccesskey and session token using asume role api.

But when I pass this as part of URL string parameter then it is failing with Caused by: com.amazonaws.services.redshift.model.AmazonRedshiftException: The security token included in the request is invalid

However when I passed using properties file then it is able to pass this state but now failing with error message java.sql.SQLException: FATAL: user "IAM:" does not exist.

The same accessKey, secretAccesskey and session token if used with aws cli with redshift data api it is working fine. aws redshift-data execute-statement --sql "select 1" --database dev --cluster-identifier <clusterId>

ravi-Mode commented 2 months ago

Hi @bhvkshah I am also trying to use the iam role base authetication with assume role api using the jdbc driver version 2.1.0.30. Here I am observing strange behavior, I am able to generate acceskey, secerectAccesskey and session token using asume role api.

But when I pass this as part of URL string parameter then it is failing with Caused by: com.amazonaws.services.redshift.model.AmazonRedshiftException: The security token included in the request is invalid

However when I passed using properties file then it is able to pass this state but now failing with error message java.sql.SQLException: FATAL: user "IAM:" does not exist.

The same accessKey, secretAccesskey and session token if used with aws cli with redshift data api it is working fine. aws redshift-data execute-statement --sql "select 1" --database dev --cluster-identifier <clusterId>

Adding to above, I have given s3 access as part of this role and when use s3 aws sdk api, then it is working so it is making sure that credential generated is correct there are something missing with jdbc driver.

aws / amazon-redshift-jdbc-driver