delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.62k stars 1.71k forks source link

[SPARK] Add new config to disable implicit not null invariants #3752

Open Kimahriman opened 1 month ago

Kimahriman commented 1 month ago

Which Delta project/connector is this regarding?

Description

Resolves #860

Adds a new config that can disable implicit not null constraints that are added for non-nullable fields. This is to get around the fact that Delta does not properly respect the struct nullability semantics of Spark. The discussion of why this is true is in the issue. There is currently no workaround for this issue, so this new config at least lets users opt-in to say "I know what I'm doing" and prevent the unwanted default behavior.

An original attempt to address this issue is https://github.com/delta-io/delta/pull/1296. This is an alternative in the hopes that something can be done to fix this issue that more users are reporting, and has frustrated my team for a long time.

How was this patch tested?

New UT

Does this PR introduce any user-facing changes?

Allows users to opt-in to skipping potential erroneous not null constraints.