An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
This looks like legacy syntax to me. By using Delta 3.2.0, I couldn't find a way to create/alter a table to have delta.invariants field in the metadata.
If I add an "invariant" like this:
CREATE TABLE default USING DELTA LOCATION '/tmp/delta/default';
ALTER TABLE default ADD COLUMN val int;
ALTER TABLE default ADD CONSTRAINT valPos CHECK (val > 0);
As we can see, the "constraint" I added is essentially a combination of two features: "checkConstraints" & "invariants".
So far, the only scenario I found that "invariants" appears alone is: adding "NOT NULL" constraint.
My experiments lead to me to the following belief:
1) "Invariants" is a legacy feature that used to represent both "check constraint" and "not null constraint".
2) Since version (reader 3, writer 7), if feature "invariants" exists but not "checkConstraints", then the table supports "NOT NULL" constraint but not "check constraint" (e.g. expression x > 3).
I propose that we fix/amend the PROTOCOL doc. What remains unclear to me is, is it legal to have a table on version (reader 3, writer 7) to support only "invariants" but not "checkConstraints", AND has "delta.invariants" in column metadata?
A clarification is much appreciated.
References:
This comment suggests "invariants is being deprecated".
This comment by @wjones127 also suggests invariants should be narrowed to be only for "NOT NULL".
Environment information
Delta Lake version: 3.2.0
Spark version: 2.12
Scala version:
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?
[ ] Yes. I can contribute a fix for this bug independently.
[x] Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
[ ] No. I cannot contribute a bug fix at this time.
DELTA protocol on Column Invariants Feature seems unclear/misleading:
The example provided in the doc was:
This looks like legacy syntax to me. By using Delta 3.2.0, I couldn't find a way to create/alter a table to have
delta.invariants
field in the metadata.If I add an "invariant" like this:
then
2.json
is:we can see a new config
delta.constraints
rather thandelta.invariants
.I then altered the table to be reader version 3 & writer version 7:
then
3.json
is:As we can see, the "constraint" I added is essentially a combination of two features: "checkConstraints" & "invariants".
So far, the only scenario I found that "invariants" appears alone is: adding "NOT NULL" constraint.
My experiments lead to me to the following belief:
1) "Invariants" is a legacy feature that used to represent both "check constraint" and "not null constraint". 2) Since version (reader 3, writer 7), if feature "invariants" exists but not "checkConstraints", then the table supports "NOT NULL" constraint but not "check constraint" (e.g. expression x > 3).
I propose that we fix/amend the PROTOCOL doc. What remains unclear to me is, is it legal to have a table on version (reader 3, writer 7) to support only "invariants" but not "checkConstraints", AND has
"delta.invariants"
in column metadata?A clarification is much appreciated.
References:
Environment information
Willingness to contribute
The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?