Open sl-nicolasmoteley opened 1 month ago
Hi @sl-nicolasmoteley , thanks for asking a question on the Cloud SQL Java Connector!
Yes our library uses Application Default Credentials (ADC) to source creds from the environment.
There are several different ways that ADC creds can be set, I wonder if workload identity federation would be a good candidate for your use-case since you mention AWS secret manager.
https://cloud.google.com/iam/docs/workload-identity-federation-with-other-clouds
@hessjcg may be more familiar with Spark and may have suggestions for you.
Hi @sl-nicolasmoteley,
It seems likely that this bit of code from your sample:
System.setProperty(
CredentialFactory.CREDENTIAL_FACTORY_PROPERTY,
"com.aviv.data.spark.datalake.tasks.ma_www.CustomCredentialFactory"
)
is only setting the system property on the task master, not on the worker instances for some reason. Thus, the master uses your custom credential factory but the workers do not.
Here is a more reliable way to install a custom credential factory: Programmatically configure your connector using Java code. See Registering a named connector.
You would update your sample something like this:
import com.google.cloud.sql.ConnectorConfig;
import com.google.cloud.sql.ConnectorRegistry;
// ...
class CloudSqlProxyTask(config: SerializableConfiguration) extends SparkTask {
override def run(): Unit = {
implicit val formats: DefaultFormats.type = org.json4s.DefaultFormats
Supplier<GoogleCredentials> credentialSupplier = () -> {
// your custom credential code goes here
return new GoogleCredentials();
}
ConnectorConfig cc =
new ConnectorConfig.Builder()
.withGoogleCredentialsSupplier(credentialSupplier)
.build();
ConnectorRegistry.register("spark-pg", cc)
val dbName = "postgres"
val instanceConnectionName = "pg8000"
val jdbcUrl = s"jdbc:postgresql:///$dbName"
val connProps = new Properties()
// Configure the JDBC properties to use your connector configuration
connProps.setProperty("cloudSqlNamedConnector", "spark-pg")
connProps.setProperty("driver", "org.postgresql.Driver")
connProps.setProperty("user", "XXXXXX")
connProps.setProperty("sslmode", "disable")
connProps.setProperty("socketFactory", "com.google.cloud.sql.postgres.SocketFactory")
connProps.setProperty("cloudSqlInstance", "XXXXX")
connProps.setProperty("enableIamAuth", "true")
val query = "(SELECT * FROM information_schema.tables) tables"
val df = session.read.jdbc(jdbcUrl, query, connProps)
df.show(false)
}
}
}
Hi @hessjcg, thanks for your answer. I've just added the named connector with your code sample but it's not found...
Caused by: java.lang.IllegalArgumentException: Named connection spark-pg does not exist.
at com.shaded.com.google.cloud.sql.core.InternalConnectorRegistry.getNamedConnector(InternalConnectorRegistry.java:367) ~[spark-boot-app.jar:?]
at com.shaded.com.google.cloud.sql.core.InternalConnectorRegistry.connect(InternalConnectorRegistry.java:169) ~[spark-boot-app.jar:?]
at com.shaded.com.google.cloud.sql.postgres.SocketFactory.createSocket(SocketFactory.java:81) ~[spark-boot-app.jar:?]
at org.postgresql.core.PGStream.createSocket(PGStream.java:223) ~[spark-boot-app.jar:?]
at org.postgresql.core.PGStream.<init>(PGStream.java:95) ~[spark-boot-app.jar:?]
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:98) ~[spark-boot-app.jar:?]
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:213) ~[spark-boot-app.jar:?]
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:51) ~[spark-boot-app.jar:?]
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:223) ~[spark-boot-app.jar:?]
at org.postgresql.Driver.makeConnection(Driver.java:465) ~[spark-boot-app.jar:?]
at org.postgresql.Driver.connect(Driver.java:264) ~[spark-boot-app.jar:?]
at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProviderBase.create(ConnectionProvider.scala:102) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1(JdbcDialects.scala:122) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.sql.jdbc.JdbcDialect.$anonfun$createConnectionFactory$1$adapted(JdbcDialects.scala:118) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:273) ~[spark-sql_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.scheduler.Task.run(Task.scala:138) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_412]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_412]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_412]
Question
Hi there !
Headaches for us since weeks...
We don't want to use GOOGLE_APPLICATION_CREDENTIALS var with a json file but we want to generate GoogleCredentials from aws secretmanager.
So we implement a custom Credential Factory then we use it with our task to read a postgre database through spark jdbc.
But we got this error message. It looks like the workers don't get the creds (master yes)...
Code
Additional Details
No response