apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.15k stars 2.14k forks source link

java.lang.IllegalArgumentException: Table identifier not set #5175

Closed saimigo closed 3 days ago

saimigo commented 2 years ago

I use Flink Table SDK to select iceberg table. Set hivecatalog to usercatalog, but looks like the default_iceberg catalog is still used.

The error message is as flollows:

10:42:41,886 INFO  org.apache.hadoop.metrics2.impl.MetricsSystemImpl            [] - s3a-file-system metrics system started
10:42:44,392 INFO  org.apache.iceberg.BaseMetastoreCatalog                      [] - Table loaded by catalog: default_iceberg.s3a_flink.icebergtbcloudtrackingtest
10:42:44,397 INFO  org.apache.iceberg.mr.hive.HiveIcebergSerDe                  [] - Using schema from existing table {"type":"struct","schema-id":0,"fields":[{"id":1,"name":"vin","required":true,"type":"string"},{"id":2,"name":"name","required":true,"type":"string"},{"id":3,"name":"uuid","required":false,"type":"string"},{"id":4,"name":"channel","required":true,"type":"string"},{"id":5,"name":"run_scene","required":true,"type":"string"},{"id":6,"name":"timestamp","required":true,"type":"timestamp"},{"id":7,"name":"rcv_timestamp","required":true,"type":"timestamp"},{"id":8,"name":"raw","required":true,"type":"string"}]}
10:42:44,832 INFO  org.apache.iceberg.BaseMetastoreTableOperations              [] - Refreshing table metadata from new version: s3a://warehouse/s3a_flink.db/icebergTBCloudTrackingTest/metadata/00011-8d1ef9f1-8172-49fd-b0de-58196642b662.metadata.json
10:42:44,866 INFO  org.apache.iceberg.BaseMetastoreCatalog                      [] - Table loaded by catalog: default_iceberg.s3a_flink.icebergtbcloudtrackingtest
10:42:44,867 INFO  org.apache.iceberg.mr.hive.HiveIcebergSerDe                  [] - Using schema from existing table {"type":"struct","schema-id":0,"fields":[{"id":1,"name":"vin","required":true,"type":"string"},{"id":2,"name":"name","required":true,"type":"string"},{"id":3,"name":"uuid","required":false,"type":"string"},{"id":4,"name":"channel","required":true,"type":"string"},{"id":5,"name":"run_scene","required":true,"type":"string"},{"id":6,"name":"timestamp","required":true,"type":"timestamp"},{"id":7,"name":"rcv_timestamp","required":true,"type":"timestamp"},{"id":8,"name":"raw","required":true,"type":"string"}]}
10:42:48,079 INFO  org.apache.hadoop.hive.metastore.HiveMetaStoreClient         [] - Trying to connect to metastore with URI thrift://hiveserver:9083
10:42:48,079 INFO  org.apache.hadoop.hive.metastore.HiveMetaStoreClient         [] - Opened a connection to metastore, current connections: 3
10:42:48,081 INFO  org.apache.hadoop.hive.metastore.HiveMetaStoreClient         [] - Connected to metastore.
10:42:48,081 INFO  org.apache.hadoop.hive.metastore.RetryingMetaStoreClient     [] - RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=root (auth:SIMPLE) retries=1 delay=1 lifetime=0
10:42:48,132 INFO  org.apache.hadoop.hive.metastore.HiveMetaStoreClient         [] - Closed a connection to metastore, current connections: 2
10:42:48,308 INFO  org.apache.flink.connectors.hive.HiveParallelismInference    [] - Hive source(s3a_flink.icebergTBCloudTrackingTest}) getNumFiles use time: 171 ms, result: 2
Exception in thread "main" java.lang.IllegalArgumentException: Table identifier not set
    at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:142)
    at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:114)
    at org.apache.iceberg.mr.Catalogs.loadTable(Catalogs.java:89)
    at org.apache.iceberg.mr.mapreduce.IcebergInputFormat.lambda$getSplits$0(IcebergInputFormat.java:102)
    at java.util.Optional.orElseGet(Optional.java:267)
    at org.apache.iceberg.mr.mapreduce.IcebergInputFormat.getSplits(IcebergInputFormat.java:102)
    at org.apache.iceberg.mr.mapred.MapredIcebergInputFormat.getSplits(MapredIcebergInputFormat.java:69)
    at org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getSplits(HiveIcebergInputFormat.java:98)
    at org.apache.flink.connectors.hive.HiveSourceFileEnumerator.createMRSplits(HiveSourceFileEnumerator.java:107)
    at org.apache.flink.connectors.hive.HiveSourceFileEnumerator.createInputSplits(HiveSourceFileEnumerator.java:71)
    at org.apache.flink.connectors.hive.HiveTableSource.lambda$getDataStream$1(HiveTableSource.java:149)
    at org.apache.flink.connectors.hive.HiveParallelismInference.logRunningTime(HiveParallelismInference.java:107)
    at org.apache.flink.connectors.hive.HiveParallelismInference.infer(HiveParallelismInference.java:95)
    at org.apache.flink.connectors.hive.HiveTableSource.getDataStream(HiveTableSource.java:144)
    at org.apache.flink.connectors.hive.HiveTableSource$1.produceDataStream(HiveTableSource.java:114)
    at org.apache.flink.table.planner.plan.nodes.exec.common.CommonExecTableSourceScan.translateToPlanInternal(CommonExecTableSourceScan.java:106)
    at org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecTableSourceScan.translateToPlanInternal(BatchExecTableSourceScan.java:49)
    at org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:134)
    at org.apache.flink.table.planner.plan.nodes.exec.ExecEdge.translateToPlan(ExecEdge.java:250)
    at org.apache.flink.table.planner.plan.nodes.exec.batch.BatchExecSink.translateToPlanInternal(BatchExecSink.java:58)
    at org.apache.flink.table.planner.plan.nodes.exec.ExecNodeBase.translateToPlan(ExecNodeBase.java:134)
    at org.apache.flink.table.planner.delegation.BatchPlanner.$anonfun$translateToPlan$1(BatchPlanner.scala:82)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
    at scala.collection.Iterator.foreach(Iterator.scala:937)
    at scala.collection.Iterator.foreach$(Iterator.scala:937)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
    at scala.collection.IterableLike.foreach(IterableLike.scala:70)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike.map(TraversableLike.scala:233)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104) 

code is :

        EnvironmentSettings settings = EnvironmentSettings.newInstance()
                .inBatchMode()
                .build();
        TableEnvironment tableEnv = TableEnvironment.create(settings);

        String catalogName = "s3IcebergCatalog";
        String defaultDatabase = "s3a_flink";
        String hiveConfDir = "flink-cloud/src/main/resources";
        HiveCatalog hive = new HiveCatalog(catalogName, defaultDatabase, hiveConfDir);

        tableEnv.registerCatalog(catalogName, hive);
        tableEnv.useCatalog(catalogName);
        tableEnv.useDatabase(defaultDatabase);
        System.out.println(tableEnv.getCurrentCatalog());
        String tableName = "icebergTBCloudTrackingTest";
        tableEnv.getConfig().setSqlDialect(SqlDialect.DEFAULT);

        String sql = "select uuid from " + tableName;
        System.out.println(sql);
        tableEnv.executeSql(sql).print();

The output of tableEnv.getCurrentCatalog() is s3IcebergCatalog. But it reports 10:42:44,866 INFO  org.apache.iceberg.BaseMetastoreCatalog    [] - Table loaded by catalog: default_iceberg.s3a_flink.icebergtbcloudtrackingtest,  and shows java.lang.IllegalArgumentException: Table identifier not set.

Does anyone know the reason please?

luoyuxia commented 1 year ago

The table is with iceberg format which requires special parameters, but you try to read it as a normal hive table with Flink which miss the required parameters. You can try to set the property iceberg.mr.table.identifier, iceberg.mr.table.location in hive-site.xml, and I think it should work.

github-actions[bot] commented 4 weeks ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] commented 3 days ago

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'