deephaven / deephaven-core

Deephaven Community Core
Other
252 stars 80 forks source link

iceberg, unable to read empty table #5873

Open devinrsmith opened 2 months ago

devinrsmith commented 2 months ago

Trying to read a table that has been created via the catalog, but doesn't have any snapshots, produces an NPE as opposed to an empty table (of the appropriate schema):

java.lang.NullPointerException: Cannot invoke "org.apache.iceberg.Snapshot.schemaId()" because "snapshot" is null
    at io.deephaven.iceberg.util.IcebergCatalogAdapter.readTableInternal(IcebergCatalogAdapter.java:524)
    at io.deephaven.iceberg.util.IcebergCatalogAdapter.readTable(IcebergCatalogAdapter.java:405)
    at io.deephaven.iceberg.util.IcebergCatalogAdapter.readTable(IcebergCatalogAdapter.java:419)
devinrsmith commented 2 months ago

The code to create an empty catalog was based on the java iceberg quickstart, https://iceberg.apache.org/docs/1.6.0/java-api-quickstart/#using-a-hadoop-catalog.

import org.apache.hadoop.conf.Configuration
import org.apache.iceberg.PartitionSpec
import org.apache.iceberg.Schema
import org.apache.iceberg.Table
import org.apache.iceberg.catalog.TableIdentifier
import org.apache.iceberg.hadoop.HadoopCatalog
import org.apache.iceberg.types.Types

// Adapted from https://iceberg.apache.org/docs/1.6.0/java-api-quickstart/#using-a-hadoop-catalog

Configuration conf = new Configuration()
String warehousePath = "file:///tmp/my_warehouse"
HadoopCatalog catalog = new HadoopCatalog(conf, warehousePath)

Schema schema = new Schema(
    Types.NestedField.required(1, "level", Types.StringType.get()),
    Types.NestedField.required(2, "event_time", Types.TimestampType.withZone()),
    Types.NestedField.required(3, "message", Types.StringType.get())
    // DH doesn't support LIST yet.
    // Types.NestedField.optional(4, "call_stack", Types.ListType.ofRequired(5, Types.StringType.get()))
)

PartitionSpec spec = PartitionSpec.builderFor(schema)
    .hour("event_time")
    .identity("level")
    .build()

TableIdentifier name = TableIdentifier.of("logging", "logs")
Table table = catalog.createTable(name, schema, spec)

produces these files

$ find /tmp/my_warehouse -type f
/tmp/my_warehouse/logging/logs/metadata/v1.metadata.json
/tmp/my_warehouse/logging/logs/metadata/.v1.metadata.json.crc
/tmp/my_warehouse/logging/logs/metadata/version-hint.text
/tmp/my_warehouse/logging/logs/metadata/.version-hint.text.crc