Open emrowirshi opened 1 year ago
@xupefei Can you take a look at this?
Publish log with non-nullable column, and without invariants
How you did this? Protocol(1, 1)
does not support non-null columns. You must have at least Protocol(1, 2)
.
Understood - that's likely a bug in my implementation, though it seems like it could also be orthogonal to the problem here. I'll test on my end with a writer version of 2, but since the issue is on other engines reading our logs (and not writing to the table), it seems possible there's still a bug here with non-nulls being treated as column invariants. Is there any chance you can follow up on the code pointers above?
Non-null a is a column invariant. Compared to other invariant rules, not-null is not stored in delta.invariants
metadata but as a separate key "nullable":false
.
In Delta we set the default protocol to (1, 2)
, which has been supported by almost all readers. I would suggest you to also use (1, 2)
in your implementation.
Got it - thanks for the clarification. Would it be worth it in this case to update the spec, since it's not entirely clear that"nullable": false
requires column invariants to be listed in the protocol's features in order to be used?
Another concern is that readers are now enforcing writerFeatures - isn't that too harsh an assert to throw if Spark is acting solely as a reader and not as a writer?
Bug
Which Delta project/connector is this regarding?
Describe the problem
We have a table whose schema includes a non-nullable column (example below), and does not explicitly specify any invariants. This table is readable via Spark up until we set the reader version to 3 and the writer version to 7. At that point, reads fail because invariants are not listed in the protocol's writerFeatures, despite the fact that the table doesn't explicitly specify invariants (error code/stack below). Looking at the code, it appears that non-null constraints in the schema are being conflated with invariants, and are then failing a downstream check that if invariants are used, they must also be specified in writerFeatures.
Steps to reproduce