Open dongxiao1198 opened 1 month ago
@nastra Could you please take a look at this?
FYI @scottsand-db
Hi @wgtmac -- can you please tell me a bit more about your use case for file stats and for getChanges?
We allow you to include a filter during the ScanBuilder -- what more would you need the file stats for?
Could you also please look at this internal (not public) API for getChanges in Kernel and see if that fits your use case? We can consider making it public.
Thanks for the reply from @scottsand-db and help from @nastra!
We use the delta kernel as a metadata client in our proprietary lakehouse to read from delta lake tables. To efficiently make splits at any snapshot and cache the file lists, we need to get following metadata from the API which is available in delta standalone:
update()
to incrementally sync to the latest version, which the standalone library supports.Hopefully my explanation makes sense.
Feature request
Which Delta project/connector is this regarding?
Overview
Since the delta-standalone has been deprecated, we are migrating out project using delta-kernel instead of delta-standalone. But we found that delta-kernel can not get file stats when scanning file lists.
In delta-standalone, we can get file stats in this class :. And we can get the change logs
using "Iterator getChanges" in io.delta.standalone.DeltaLog which can not be list in delta-kernel too.
Motivation
Further details
Willingness to contribute