Open meyergin opened 2 years ago
@meyergin This looks like an issue with your query. The path to your file needs to be in backticks.
IE:
SELECT *
FROM dfs.`/db/test`
You can also define workspaces which are shortcuts to file paths. (https://drill.apache.org/docs/workspaces/)
This looks like an issue with your query. The path to your file needs to be in backticks.
The gray block in his comment makes me think that GitHub has interpreted the backticks he did use as marking up a code block.
@jnturton I missed that. Thanks!
In any event, this looks like Drill is not seeing the fill in that folder. Here's what I'd do.
dfs
storage plugin config.SHOW FILES IN dfs
to verify that Drill can in fact see any files in that path. If all that checks out, I think we need to do some digging.
Sorry for the poor format, I've modified it. And I have tested dfs and s3 storage, no one works.
dfs storage config:
{
"type": "file",
"connection": "file:///",
"workspaces": {
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
"formats": {
"iceberg": {
"type": "iceberg",
"properties": {
"read.split.target-size": "134217728",
"read.split.metadata-target-size": "33554432"
},
"caseSensitive": true,
"includeColumnStats": null,
"ignoreResiduals": null,
"snapshotId": null,
"snapshotAsOfTime": null,
"fromSnapshotId": null,
"toSnapshotId": null
}
},
"enabled": true
}
show files return correct data and metadata folders.
error log:
org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist at location: /db/test
at org.apache.iceberg.hadoop.HadoopTables.load(HadoopTables.java:89)
at org.apache.drill.exec.store.iceberg.IcebergGroupScan.initTableScan(IcebergGroupScan.java:123)
Hi, I have problems reading an Apache Iceberg table hosted in S3 (I commented this problem on Slack but I share it here too). My Drill version is 1.21.1 (updated recently)
The Storage Plugin configuration is saved as "S3_iceberg", this is the xml:
{
"type": "file",
"connection": "s3a://xxx-data-lake",
"config": {
"fs.s3a.secret.key": "1111",
"fs.s3a.access.key": "111"
},
"workspaces": {
"default": {
"location": "/iceberg-data-lake",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"root": {
"location": "/iceberg-data-lake",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
"formats": {
"iceberg": {
"type": "iceberg",
"properties": {
"read.split.target-size": "536870912",
"read.split.metadata-target-size": "33554432"
},
"caseSensitive": true,
"includeColumnStats": true,
"ignoreResiduals": null,
"snapshotId": null,
"snapshotAsOfTime": null,
"fromSnapshotId": null,
"toSnapshotId": null
}
},
"authMode": "SHARED_USER",
"enabled": true
}
The command "SHOW FILES IN s3_iceberg" returns
"SHOW FILES IN s3_iceberg.iceberg_accumulated_exposure" returns
When I launch this query "_ANALYZE TABLE s3_iceberg.iceberg_accumulatedexposure REFRESH METADATA;" I get
The query "_select * from s3_iceberg.iceberg_accumulatedexposure" returns same error
Can you share a configuration that you know works?
Thank you.
after upgrade version to 1.20.1, I add iceberg configure followed by the tutorial(https://drill.apache.org/docs/iceberg-format-plugin/), but seems it doesn't work which return err as following: SYSTEM ERROR: NoSuchTableException: Table does not exist at location sql:
test is folder of iceberg table which includes metadata and data folders. I doubted that maybe the question is the type of catalog, does drill support hive-based catalogs or just support hadoop-based catalogs?