Closed rbabu7 closed 4 years ago
Hello!
Thank you for using the library.
Currently the solution would be to specify the schema manually in Python as documented here: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#programmatically-specifying-the-schema
Example:
from pyspark.sql.types import *
fields = [StructField("someField", StringType(), True), StructField("someOtherField", StringType(), True)]
schema = StructType(fields)
dynamoDf = spark.read \
.option("tableName", "SomeTableName") \
.schema(schema) \ # <-- Here we specify the schema manually
.format("dynamodb") \
.load()
Does this solve your problem?
Thanks, Jacob
Thank you Jacob , this works for me. Also is there a way to specify the GSI when querying dynamodb or is this intelligent to figure out on its own based on predicate provided?
Hi rbabu7
To read from a GSI you can use the following option:
dynamoDf = spark.read \
.option("tableName", "SomeTableName") \
.option("indexName", "YourIndexName") \ # <-- Here we specify the GSI
.schema(schema) \
.format("dynamodb") \
.load()
Can we close the issue?
Thanks, Jacob
Unfortunately it is not yet intelligent enough to use query over scan, even for global secondary index. See #62
Thank you for your support
I don't see an option to provide schema in pyspark, while same option is in scala. Please let me know how to provide a class while reading data using pyspark