delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[Spark] Detect opaque URIs and throw proper exception #3870

Closed cstavr closed 1 week ago

cstavr commented 1 week ago

Which Delta project/connector is this regarding?

Description

An opaque URI is an absolute URI whose scheme-specific part does not begin with a slash character ('/'), and are not further parsed by java.net.URI library (see https://docs.oracle.com/javase/7/docs/api/java/net/URI.html):

val uri = new URI("http:example.com")
uri.isOpaque -> true
uri.isAbsolute -> true
uri.getPath -> null

This causes issues when we try to call path-related methods in the URIs, e.g.:

val filePath = new Path(uri)
filePath.toString -> "http:"
filePath.isAbsolute -> NullPointerException

This commit fixes this issue by detecting such URIs in Delta file actions and throwing a proper exception.

How was this patch tested?

Add new UT.

Does this PR introduce any user-facing changes?

No