apache / sedona

A cluster computing framework for processing large-scale geospatial data
https://sedona.apache.org/
Apache License 2.0
1.96k stars 692 forks source link

[SEDONA-670] Fix GeoJSON reader for DBR #1662

Closed Kontinuation closed 3 weeks ago

Kontinuation commented 3 weeks ago

Did you read the Contributor Guide?

Is this PR related to a JIRA ticket?

What changes were proposed in this PR?

This PR works around an internal method incompatibility between open-source Apache Spark and DBR. The readFile method defined by open-source Apache Spark is:

def readFile(
  conf: Configuration,
  file: PartitionedFile,
  parser: JacksonParser,
  schema: StructType): Iterator[InternalRow]

While this function on DBR takes an extra Option[_] parameter:

def readFile(
  conf: Configuration,
  file: PartitionedFile,
  parser: JacksonParser,
  schema: StructType,
  badRecordsWriter: Option[BadRecordsWriter]): Iterator[InternalRow]

We workaround this problem by detecting the number of parameters of the readFile function using reflection, and pass the appropriate parameters to them.

How was this patch tested?

Passing existing tests and manually tested on DBR 15.4 LTS.

Did this PR include necessary documentation updates?