Closed FengJiang2018 closed 1 year ago
@FengJiang2018 I think this is probably because Databricks Spark use different internal APIs for data sources, compared to open-source Apache Spark. I just tested the Sedona geoparquet reader using our docker image: https://hub.docker.com/r/apache/sedona It works fine. Could you let me know what Databricks runtime version you are using and what Sedona version you are using?
Do you mind contacting my email (jiayu@apache.org)?
Disabled photon acceleration option on the cluster solved the read/write problem. Looking forwarding to see it could be supported in future as Photon has lots of perf gain.
Expected behavior
geoparquet should have geo metadata be generated and should not raise error during read by using
Here is the error details
Actual behavior
geoparquet was created without geo metadata and got error during read by using
Steps to reproduce the problem
Seems like the issue is when I was using df.write to a geoparquet file, the geo metadata was not created for the Sedona geometry column. I am not sure if anything I missed.
1, I am using overture public dataset as input for the dataframe as following with Sedona Geometry column
2, Yes I am using DataFrame to write a geoparquet file with Sedona Geometry Type column on databricks.
Here is what I saw from the printSchema, it shows as geometry type, but the nullable is true seems like this is expected. Correct me if this is wrong.
3, I got an error when I am using following way to read the geoparquet from #2
Here is the error details
But there is read error if I use following code, but no geo metadata cound be found from df schema
Settings
Sedona version = 1.5.0
Apache Spark version = 3.4.0
Apache Flink version = N/A
API type = Python
Scala version = 2.12
JRE version = 1.8
Python version = 3.10
Environment = Azure Databricks, notebook