awslabs / emr-dynamodb-connector

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Apache License 2.0
217 stars 135 forks source link

Unable to test with Embedded DynamoDB #126

Open taylorcressy opened 4 years ago

taylorcressy commented 4 years ago

Hi there,

We are building out a spark structured streaming job and are attempting to run build-level integration tests with an embedded DynamoDB instance.

The problem we are running into is that it appears the connector only connects to Dynamo with HTTPS. And the embedded dynamo only accepts HTTP.

Is there a way to toggle off SSL for the connector?

Example code for setting up embedded Dynamo

val localArgs = Array("-inMemory", "-sharedDb")
    var server: DynamoDBProxyServer = null
    try {
      server = ServerRunner.createServerFromCommandLineArgs(localArgs)
      server.start()
      val builder = AmazonDynamoDBClientBuilder.standard
        .withEndpointConfiguration( // we can use any region here
          new AwsClientBuilder.EndpointConfiguration("https://localhost:8000", "test"))

      builder.setCredentials(new AWSCredentialsProviderTest)
      dynamoDb = builder.build()
    }

Our job conf is:

val ddbConf = new JobConf(hadoopConfiguration)
    ddbConf.set("dynamodb.output.tableName", tableName)
    ddbConf.set("dynamodb.throughput.write.percent", "1.0")
    ddbConf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
    ddbConf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
    ddbConf.set("dynamodb.endpoint", "https://localhost:8000")
    ddbConf.set("dynamodb.servicename", "dynamodb")
    ddbConf.set("dynamodb.regionid", "test")

This leads to the following exception:

 Exception encountered when invoking run on a nested suite - Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?
com.amazonaws.SdkClientException: Unable to execute HTTP request: Unrecognized SSL message, plaintext connection?
christopherhudy commented 4 years ago

I am running into the same problem. Looks like the ddb client is private so it doesnt look like we can pass a config to it.

Also ddb local theres no easy way to use it with ssl. One way is to setup a reverse proxy to terminate the ssl.

Has anyone found a workaround or solution yet?

taylorcressy commented 4 years ago

@christopherhudy I originally did what you said and setup a proxy. This proved to be way too much effort.

So instead I rolled my own implementation of

ddbConf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat") ddbConf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")

Which did the trick.

christopherhudy commented 4 years ago

Yea i ended up ditching ddb local and mocked for unit testing. Then created actual integration tests with ddb. Thanks @taylorcressy