delta-incubator / delta-kernel-rs

A native Delta implementation for integration with any query engine
Apache License 2.0
137 stars 38 forks source link

Filter pushdown issue #233

Closed samansmink closed 4 months ago

samansmink commented 4 months ago

Using duckdb_delta I'm seeing some weird behaviour with filter pushdown.

When running the following query on one of the DAT tables:

SELECT letter, number
FROM delta_scan('${DAT_PATH}/out/reader_tests/generated/basic_append/delta')
WHERE number < 2

This results in DuckDB trying to push down the Predicate number < 2 AND number is not null.

Currently in the main branch of DuckDB delta, we will not push down filters that are not completely supported by kernel. So this filter is not pushed down at all.

However, to me it seems like that even though IS NOT NULL is not yet supported, we should still be able to do file skipping here because kernel could handle the children of the conjunction AND that is does support while ignoring the children that are not yet supported.

In the code from the DuckDB delta demo, this code was already there so I kindof assumed it to be working, but maybe thats not the case.

However when I try this do this, i get incorrect results. So what I try is call ffi::visit_expression_and with an iterator that will iterate over the childeren. When DuckDB is then called by kernel to handle the IS NOT NULL filter, we return ~0 to the kernel to indicate that this filter type is not yet supported.

However when I do that, the above query actually returns an incorrect result: All files are now skipped, which is clearly wrong because:

FROM delta_scan('${DAT_PATH}/out/reader_tests/generated/basic_append/delta')

returns

4
5
1
2
3

I'm not really sure if I'm misunderstanding the expression visitor API or there's something actually wrong here. Like returning ~0 seems to be the convention to indicate an unsupported filter, but it seems to not always be working.

TLDR; not sure if there's a bug in kernel, but I think we need some clarity on how to use the expression vistors when parts of an AND filter are unsupported @nicklan

nicklan commented 4 months ago

This is fixed by https://github.com/duckdb/duckdb_delta/pull/27

Closing this, as it's not actually an issue in kernel.

I have opened #234 to see if we can be more efficient here.

nicklan commented 4 months ago

Also, just to answer your question, returning ~0 is the right thing to do when an expression isn't supported.