deephaven / deephaven-docs-community

Source code for Community docs on the deephaven.io website.
Apache License 2.0
0 stars 5 forks source link

feat: expose Iceberg features to python users #241

Open deephaven-internal opened 2 weeks ago

deephaven-internal commented 2 weeks ago

This issue was auto-generated

PR: https://github.com/deephaven/deephaven-core/pull/5590 Author: lbooker42

Original PR Body

Exposes Iceberg table support and adapter creation through python.

Will close #5574

Example Usage:

Local (MinIO + REST Catalog):

from deephaven.experimental import s3, iceberg

local_adapter = iceberg.adapter_s3_rest(
        name="minio-iceberg",
        catalog_uri="http://rest:8181",
        warehouse_location="s3a://warehouse/wh",
        region_name="us-east-1",
        access_key_id="admin",
        secret_access_key="password",
        end_point_override="http://minio:9000");

t_ns = local_adapter.namespaces()
t_tables = local_adapter.tables("sales")
t_snapshots = local_adapter.snapshots("sales.sales_multi")

#################################################

s3_instructions = s3.S3Instructions(
        region_name="us-east-1",
        access_key_id="admin",
        secret_access_key="password",
        endpoint_override="http://minio:9000"
        )

iceberg_instructions = iceberg.IcebergInstructions(data_instructions=s3_instructions)

data_table = local_adapter.read_table(table_identifier="sample.all_types", instructions=iceberg_instructions)

sales_table = local_adapter.read_table(table_identifier="sales.sales_single", instructions=iceberg_instructions)

sales_restricted = sales_table.select(["Region", "Item_Type", "Unit_Price", "Order_Date"])

sales_pt = local_adapter.read_table(table_identifier="sales.sales_partitioned", instructions=iceberg_instructions)

#################################################

custom_instructions = iceberg.IcebergInstructions(
        data_instructions=s3_instructions,
        column_renames={
                "Region":"Area",
                "Item_Type":"Category"
        })

sales_custom = local_adapter.read_table(table_identifier="sales.sales_single", instructions=custom_instructions)

#################################################

from deephaven import dtypes        

custom_instructions = iceberg.IcebergInstructions(
        data_instructions=s3_instructions,
        column_renames={
                "Region":"Area",
                "Item_Type":"Category"
        }, table_definition={
                "Area": dtypes.string,
                "Category": dtypes.string,
                "Unit_Price": dtypes.double
        })

sales_custom_td = local_adapter.read_table(table_identifier="sales.sales_single", instructions=custom_instructions)

AWS Glue:

NOTE: the region and credentials are specified locally in the ~/.aws/config and ~/.aws/credentials files.

from deephaven.experimental import s3, iceberg

cloud_adapter = iceberg.adapter_aws_glue(
        name="aws-iceberg",
        catalog_uri="s3://lab-warehouse/sales",
        warehouse_location="s3://lab-warehouse/sales");

t_ns = cloud_adapter.namespaces()
t_tables = cloud_adapter.tables("sales")
t_snapshots = cloud_adapter.snapshots("sales.sales_single")

#################################################

sales_table = cloud_adapter.read_table(table_identifier="sales.sales_single")

#################################################

custom_instructions = iceberg.IcebergInstructions(
        column_renames={
                "region":"Area",
                "item_type":"Category"
        })

sales_custom = cloud_adapter.read_table(table_identifier="sales.sales_single", instructions=custom_instructions)

#################################################

from deephaven import dtypes        

custom_instructions = iceberg.IcebergInstructions(
        column_renames={
                "region":"Area",
                "item_type":"Category",
                "unit_price":"Price"
        }, table_definition={
                "Area": dtypes.string,
                "Category": dtypes.string,
                "Price": dtypes.double
        })

sales_custom_td = cloud_adapter.read_table(table_identifier="sales.sales_single", instructions=custom_instructions)