Closed paul-rogers closed 10 months ago
Once we merge this, we should also rebase https://github.com/apache/drill/pull/2515 on the current master and merge that as well.
Thanks @cgivre for the comments and review. @luocooong, I'll commit this. When convenient, please see if this addresses the issue you raised long ago. Otherwise, these capabilities are available for the next person who is seduced into trying out the UNION-based types.
I think we can regard this as a bug fix to framework code already in use in 1.21 and therefore backport it.
Backporting should be safe: as safe as having the change in master itself.
I successfully squashed the commits, and provided a proper commit message, while preserving the later commit. Did a force push to master to rewrite history.
You should update your own master to pick up the revised history.
@cgivre, the .asf.yaml
file you mentioned has lots of metadata, but does not actually prevent a force push. Perhaps we are missing something? It would generally be a good idea to forbid such things to prevent catastrophic mistakes.
I successfully squashed the commits, and provided a proper commit message, while preserving the later commit. Did a force push to master to rewrite history.
You should update your own master to pick up the revised history.
You are a braver man than I.
@cgivre, the
.asf.yaml
file you mentioned has lots of metadata, but does not actually prevent a force push. Perhaps we are missing something? It would generally be a good idea to forbid such things to prevent catastrophic mistakes.
Thanks for flagging... I'm not sure how to do that, but I'll investigate.
@cgivre, the
.asf.yaml
file you mentioned has lots of metadata, but does not actually prevent a force push. Perhaps we are missing something? It would generally be a good idea to forbid such things to prevent catastrophic mistakes.
Oh that's interesting. Something's changed since I last went through this with @vvysotskyi to do something that could only be done on master, perhaps testing of the automatic snapshot artifact publishing which requires access to GitHub Actions secrets.
Support for non-projected complex vectors
Description
The EVF mechanism provides scan-time projection for many vector types. The reader code is simple: it deserializes all columns for formats such as JSON, CSV, etc., and writes them to the
ColumnWriter
objects. Internally, EVF simply ignores the data for unprojected columns. This solution simplifies the readers: it is not necessary for each reader to include the complex code to handle projection. This solution is also performant: projection is done at scan time rather than the other approach, which is to read all data into vectors, then allow a PROJECT operator to drop the unprojected columns.Present EVF projection support handles most scalar and "well-structured" columns (repeated types AKA arrays, maps, etc.) However it does not handle the more esoteric types UNION, LIST (AKA repeated UNION), REPEATED LIST (AKA repeated, repeated UNION). This PR provides more support, though holes remain. In particular, both the UNION and LIST types now provide scan-time non-projection support.
Documentation
This is an internal feature: no user-visible documentation is required.
Testing
Extended existing EVF-related unit tests. There is an entire new test file to exercise the conditions of interest.