Open knightazura opened 2 years ago
Do you still see the same issue without format_options
and schema
?
If so, can you paste the original full stacktrace instead of Glue Exception Analysis
?
Meanwhile, immediate workaround will be to use the traditional DynamoDB connector instead of the new DynamoDB export connector.
Hello, thanks for your reply.
Do you still see the same issue without
format_options
andschema
? If so, can you paste the original full stacktrace instead ofGlue Exception Analysis
?
Yes, still got the issue without those parameters. Unfortunately the log has been removed because it passed retention time.
Meanwhile, immediate workaround will be to use the traditional DynamoDB connector instead of the new DynamoDB export connector.
Actually the reason why we choose use new DynamoDB export connector is to avoid traditional connector due to its more expensive cost.
I have tested but I was not able to reproduce the issue. Can you compare this with your job run? At least we need to reproduce the issue to identify the cause.
Test data in DynamoDB table:
{
"user_id": {
"S": "674321"
},
"map_example": {
"M": {
"150": {
"S": "One hundred fifty"
}
}
},
"post_id": {
"N": "23456"
}
}
Test Glue script:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
dyf = glueContext.create_dynamic_frame_from_options(
connection_type="dynamodb",
connection_options={
"dynamodb.export": "ddb",
"dynamodb.tableArn": "arn:aws:dynamodb:us-east-1:123456789101:table/test_table",
"dynamodb.unnestDDBJson": True,
"dynamodb.s3.bucket": "bucket_name",
"dynamodb.s3.prefix": "ddbexport/",
},
format="json",
transformation_ctx="itemsDF"
)
dyf.toDF().show()
Output:
+-------+--------------------+-----------------+-------+
|user_id| map_example| 150|post_id|
+-------+--------------------+-----------------+-------+
| 674321|{{One hundred fif...|One hundred fifty| 23456|
+-------+--------------------+-----------------+-------+
I have tested again using your script and still got the issue.
Test data
{
"user_id": {
"N": "222222"
},
"post_id": {
"N": "222222"
},
"map_example": {
"M": {
"150": {
"S": "One hundred fifty"
}
}
}
}
May I know, what version of Spark are you using?
Anyway, here's the full log.
I tested it on Glue 3.0 / Spark 3.1.1. Where are you testing? Glue job? Local lib? Docker container? Which version?
Tested from Glue Job. Version is same as yours, Glue 3.0 / Spark 3.1
Hello
I've create a Glue job that simply read data from DynamoDB that use AWS Glue DynamoDB export connector as source, do some transformation and then write result into S3. But it failed, got
ParseException
error like this.After tinkering with the error, I found that if there is a record that has field with
map
type specifically the key is number, it will cause the error.Here's example record in form of DynamoDB JSON
I have two questions:
DynamicFrame
? Incidentally the problematic field is not nececssary in my case, so I thought maybe it can be skipped. Have tried using schema informatOptions
but it didn't work.1
is not possible, how to resolve this error?Thanks
Snippet to create DynamicFrame
ETL Job details
Glue Job Type: Spark ETL Language: Python Glue Version: 3.0