duckdb / duckdb_iceberg

MIT License
135 stars 20 forks source link

feat: AWS glue catalog support for iceberg_scan() #51

Open rustyconover opened 5 months ago

rustyconover commented 5 months ago

Add support for accessing tables stored at AWS Glue.

Example SQL call:

select * from iceberg_scan('{ "catalog_type": "glue", "region": "us-east-1", "database_name": "test_iceberg", "table_name": "users"}');

Added the framework for more additional external Iceberg catalog:

This JSON object should be of this format:

{
  "catalog_type": "glue",
  "catalog": "1234567890",          // optional - the catalog to use
  "region": "us-east-1",            // required - change to the right region
  "database_name": "test_iceberg",  // required - change to your database
  "table_name": "table_name"        // required - change for each table.
}
samansmink commented 5 months ago

Hey @rustyconover! Thanks a lot for the PR's!

To review this I will need to setup some aws glue table table myself to test it out, I will try to find some time tomorrow to do this.

One small comment I do have already is that I'm not sure the json string is the neatest way of passing the configuration to the Iceberg scan function. Maybe we can instead just add all of them as named_parameters to the iceberg table function. I think many of these will be shared among catalog_types anyway and that way the parser will help give meaningful error messages and syntax highlighting of the SQL strings works better.

rustyconover commented 5 months ago

Hi @samansmink,

I'll look at changing to named parameters and post a revised PR.

Rusty

rustyconover commented 5 months ago

Hi @samansmink,

I've changed things around to use named parameters and added the support so that the iceberg_metadata() function can also use the same configuration.

Rusty

rustyconover commented 5 months ago

You can now run queries that look like this:

select * from iceberg_scan('users', catalog_type="glue", region="us-east-1", database_name="test_iceberg");

select * from iceberg_metadata('users', catalog_type="glue", region="us-east-1", database_name="test_iceberg");
harel-e commented 5 months ago

@rustyconover - Thank you for this PR and #50. I have access to Iceberg tables on AWS Glue and can help testing this feature. Is it possible to provide a binary or docker image for this PR? I'm having issues building Duckdb locally. If the binary will contain #50, I can test that one as well.

rustyconover commented 5 months ago

Hi @harel-e,

Thank you for your kind words.

Unfortunately I can't help you build the extension or package it as a Docker container. You might want to try asking on the DuckDB discord for help building DuckDB.

I'm building it on Mac OS X. I had to make some changes to vcpkg to work around the fall out of the xz package unavailability with boost.

Rusty

samansmink commented 5 months ago

vcpkg should be restored again from the xz debacle afaik! Check out https://github.com/duckdb/extension-template for some instructions on setting up vcpkg for extension builds.

harel-e commented 4 months ago

I tested this branch on AWS with several Iceberg tables.

This query pattens works fine: select * from iceberg_scan('users', catalog_type="glue", region="us-east-1", database_name="test_iceberg");

Hoping to see it in the upcoming 0.10.3

Thank you @rustyconover for this wonderful addition. DuckDB is now one step closer to work seamlessly in AWS

samansmink commented 4 months ago

Sorry for the absence here, I've been really busy

There are still some problems remaining with CI here on windows and linux amd64, those would need to be fixed for this to get merged before 0.10.3

rustyconover commented 4 months ago

I'll take a look at the linux build failures, but the windows ones I don't have access to that platform.

arnabneogi86 commented 3 months ago

@rustyconover : Does this support Nessie catalog for iceberg?

janosszendivarga commented 1 month ago

Any chance to make this PR merged?

szalai1 commented 1 month ago

@rustyconover are you still working on this? would it make sense for someone to pick this up?

rustyconover commented 1 month ago

I'm not actively working on this PR, feel free to finish it up.