Open Y-Asahi-dev opened 1 year ago
Hi @Y-Asahi-dev , thanks for reporting this issue. Can you try: 1/ with the latest driver, version 2.1.0.18 2/ passing the session token as part of a Property object, instead of as part of the URL?
@bhvkshah
1/ with the latest driver, version 2.1.0.18
There was no change in the results.
2/ passing the session token as part of a Property object, instead of as part of the URL?
https://docs.aws.amazon.com/ja_jp/redshift/latest/mgmt/spark-redshift-connector-other-config.html
Remove the IAM parameters from the URL and
temporary_aws_access_key_id temporary_aws_secret_access_key temporary_aws_session_token
When I specified it in the config setting value, the process was successful. But this config can be processed normally even if the value is empty, so it probably doesn't work. By removing the IAM parameter from the URL, it seems that Redshift is only auto-loading a valid IAM user.
thanks.
Hi @bhvkshah I am also trying to use the iam role base authetication with assume role api using the jdbc driver version 2.1.0.30. Here I am observing strange behavior, I am able to generate acceskey, secerectAccesskey and session token using asume role api.
But when I pass this as part of URL string parameter then it is failing with Caused by: com.amazonaws.services.redshift.model.AmazonRedshiftException: The security token included in the request is invalid
However when I passed using properties file then it is able to pass this state but now failing with error message java.sql.SQLException: FATAL: user "IAM:
The same accessKey, secretAccesskey and session token if used with aws cli with redshift data api it is working fine.
aws redshift-data execute-statement --sql "select 1" --database dev --cluster-identifier <clusterId>
Hi @bhvkshah I am also trying to use the iam role base authetication with assume role api using the jdbc driver version 2.1.0.30. Here I am observing strange behavior, I am able to generate acceskey, secerectAccesskey and session token using asume role api.
But when I pass this as part of URL string parameter then it is failing with Caused by: com.amazonaws.services.redshift.model.AmazonRedshiftException: The security token included in the request is invalid
However when I passed using properties file then it is able to pass this state but now failing with error message java.sql.SQLException: FATAL: user "IAM:" does not exist.
The same accessKey, secretAccesskey and session token if used with aws cli with redshift data api it is working fine.
aws redshift-data execute-statement --sql "select 1" --database dev --cluster-identifier <clusterId>
Adding to above, I have given s3 access as part of this role and when use s3 aws sdk api, then it is working so it is making sure that credential generated is correct there are something missing with jdbc driver.
Driver version
2.1.0.9
Redshift version
PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.54052
Client Operating System
Amazon EMR ver6.9
※OS info NAME="Amazon Linux" VERSION="2" ID="amzn" ID_LIKE="centos rhel fedora" VERSION_ID="2" PRETTY_NAME="Amazon Linux 2" ANSI_COLOR="0;33" CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2" HOME_URL="https://amazonlinux.com/"
JAVA/JVM version
openjdk version "1.8.0_382" OpenJDK Runtime Environment Corretto-8.382.05.1 (build 1.8.0_382-b05) OpenJDK 64-Bit Server VM Corretto-8.382.05.1 (build 25.382-b05, mixed mode)
Table schema
Problem description
I am using amazon-redshift-jdbc-driver v2 with Pyspark (Spark version 3.3.2). I get a SQLException when I run the code below. It seems that an error occurs if the URL has a SessionToken parameter. After replacing the jdbc-driver with v1 I get ret.count() results successfully without any errors.
Has the behavior changed between v1 and v2 when there is a SessionToken?
//----------------------------------------------------------- from pyspark import SparkContext from pyspark.sql import SQLContext import boto3
redshift_cluster_id = 'sample_cluster' redshift_dbname = 'sample_db' bucket_name = 'sample_bucket'
sc = SparkContext.getOrCreate() credentials = boto3.Session().get_credentials() region = boto3.Session().region_name sc._jsc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", credentials.access_key) sc._jsc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", credentials.secret_key)
url = f'jdbc:redshift:iam://{redshift_cluster_id}:{region}/{redshift_dbname}?DbUser=test_uesr&DbGroups=test_users_group&AutoCreate=true&AccessKeyID={credentials.access_key}&SecretAccessKey={credentials.secret_key}&user=&password=' token = credentials.token sc._jsc.hadoopConfiguration().set("fs.s3.awsSessionToken", token) url = f'{url}&SessionToken={credentials.token}'
sql_context = SQLContext(sc) dfr = sql_context.read \ .format('io.github.spark_redshift_community.spark.redshift') \ .option('url', url) \ .option('tempdir', f's3a://{bucket_name}/redshift_temp_dir') \ .option('forward_spark_s3_credentials', 'true') \ .option('fetchsize', 10000)
query = f'select * from sample_table' ret = dfr.option('query', query).load(schema=None) ret.count() -----------------------------------------------------------//
JDBC trace logs
Reproduction code