awslabs / aws-glue-libs

AWS Glue Libraries are additions and enhancements to Spark for ETL operations.
Other
635 stars 300 forks source link

SSL certificates not set up for JDBC connections with enforceSSL? #139

Open kherrera-ebsco opened 2 years ago

kherrera-ebsco commented 2 years ago

Are certificates missing or outdated for the image? I am receiving the following error when using a Glue JDBC connection that has enforceSSL enabled.

22/06/08 20:29:33 INFO JDBCWrapper$: enforceSSL = true, from connection properties, will only attempt SSL with CN matching
22/06/08 20:29:33 INFO JDBCWrapper$: INFO: using ssl properties: Map(sslrootcert -> , loginTimeout -> 10, sslmode -> verify-full)
Traceback (most recent call last):
  File "/home/glue_user/workspace/src/job.py", line 29, in <module>
    transformation_ctx=f'{args["SOURCE_DATABASE"]}_{args["SOURCE_TABLE"]}_pull',
  File "/home/glue_user/aws-glue-libs/PyGlue.zip/awsglue/dynamicframe.py", line 625, in from_catalog
  File "/home/glue_user/aws-glue-libs/PyGlue.zip/awsglue/context.py", line 179, in create_dynamic_frame_from_catalog
  File "/home/glue_user/aws-glue-libs/PyGlue.zip/awsglue/data_source.py", line 36, in getFrame
  File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
  File "/home/glue_user/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
  File "/home/glue_user/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o46.getDynamicFrame.
: org.postgresql.util.PSQLException: Could not open SSL root certificate file .
        at org.postgresql.Driver$ConnectThread.getResult(Driver.java:357)
        at org.postgresql.Driver.connect(Driver.java:281)
        at com.amazonaws.services.glue.util.JDBCWrapper.getRawConnection(JDBCUtils.scala:688)
        at com.amazonaws.services.glue.JDBCDataSource.getPrimaryKeys(DataSource.scala:892)
        at com.amazonaws.services.glue.JDBCDataSource.$anonfun$getJdbcJobBookmark$1(DataSource.scala:785)
        at scala.collection.MapLike.getOrElse(MapLike.scala:131)
        at scala.collection.MapLike.getOrElse$(MapLike.scala:129)
        at scala.collection.AbstractMap.getOrElse(Map.scala:63)
        at com.amazonaws.services.glue.JDBCDataSource.getJdbcJobBookmark(DataSource.scala:785)
        at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:860)
        at com.amazonaws.services.glue.DataSource.getDynamicFrame(DataSource.scala:99)
        at com.amazonaws.services.glue.DataSource.getDynamicFrame$(DataSource.scala:99)
        at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:689)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.FileNotFoundException:  (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at org.postgresql.ssl.jdbc4.LibPQFactory.<init>(LibPQFactory.java:120)
        at org.postgresql.ssl.MakeSSL.convert(MakeSSL.java:42)
        at org.postgresql.core.v3.ConnectionFactoryImpl.enableSSL(ConnectionFactoryImpl.java:351)
        at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:137)
        at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:67)
        at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:216)
        at org.postgresql.Driver.makeConnection(Driver.java:406)
        at org.postgresql.Driver.access$100(Driver.java:54)
        at org.postgresql.Driver$ConnectThread.run(Driver.java:316)
        ... 1 more
ejbp commented 1 year ago

Did you find any solution for this?

redlumxn commented 2 months ago

I ran into the same issue running a Jupyter notebook locally (using AWS Glue version 4.0: amazon/aws-glue-libs:glue_libs_4.0.0_image_01

The cell in question is:

dyf = glueContext.create_dynamic_frame.from_catalog(database='new_db', \
                                                    table_name='my_table')

Some background:

The only way I've managed to get it to work is by passing additional_options to override the connection's value to disable the enforcement of SSL. Like this:

dyf = glueContext.create_dynamic_frame.from_catalog(database='new_db', \
                                                    table_name='my_table', \
                                                   additional_options={"enforceSSL": "false"})

Lastly, I tried creating a spark dataframe (jdbc) directly as specified below. As you can see I'm setting sslmode to do a CA verification.

This works ONLY if I set the RDS global-bundle cert in the /home/glue_user/.postgresql/root.crt container path. This is the jdbc driver default location

jdbcDF = spark.read \
    .format("jdbc") \
    .option("url", "jjdbc:postgresql://blablabla.us-east-1.rds.amazonaws.com:5432/database?sslmode=verify-ca") \
    .option("dbtable", "myschema.mytable") \
    .option("user", "test") \
    .option("password", "test") \
    .load()
jdbcDF.printSchema()
jdbcDF.count()
jdbcDF.show(10)

I guess what would solve the issue for Glue's dataframe from_catalog is to be able to set the location of the CA certificate. Sadly, I haven't been able to find the right configuration parameters/settings to get it right.

Does anyboyd have any pointers?