Closed zeroshade closed 3 months ago
@Fokko @nastra This should be ready for review now, though there's a weirdness in the number of data files being created for one of the integration testing tables on the CI here vs when I run the docker compose and provisioning locally. I don't know enough about spark-iceberg internals to know whether that is a quirk, expected, or something that I should change the tests for. Any ideas?
I've added a comment in scanner_test.go
referencing the weirdness. You can also look at the failed CI runs for examples.
@nastra Any further comments?
thanks for the patience here @zeroshade. I'll do a full review in the next 2-3 days
@zeroshade could you please rebase this one now that all the other PRs are merged?
@nastra All rebased already :smile:
Very rough initial implementation of metrics evaluation and a simple scanner for Tables that produces the list of
FileScanTask
s to perform a scan along with positional delete files and so on.This also includes a framework and setup for performing integration testing that is adapted from the approach used in pyiceberg, creating docker images and a file of tests which are only executed by setting the
integration
tag which is used in a new workflow which runs those tests.This provides an end-to-end case of using a table and row-filter-expression to perform manifest and metrics evaluations to create the plan for scanning. The next step would be actually fetching the data!