Closed dvictori closed 8 months ago
As a complement, I check using the duckdb cli and the following query works.
I'm using Metabase v1.46.2 DuckDB v0.8.1
select * from read_parquet('/app/dados/dados_parquet/sicor_operacao_basica.parquet/*/*')
Im using duckdb views to do this and also setting the datatype of each field instead of using star notation.
CREATE OR REPLACE VIEW {table_name} AS (
SELECT {fields} FROM read_parquet(['path/to/parquet_data/*/*.parquet'], HIVE_PARTITIONING=true)
)
With this you can add or remove data from the path and it will automatically update to metabase. Not sure how well the caching works when you do change the data.
I was having this issue when using a Duckdb database "in memory". But following what you did, I'm now creating a duckdb database on disk, inserting a bunch of views that use read_parquet
and loading that file on Metabase. It's working just fine and I can use partitioned parquet files.
I misread you comment. After some testing I realized that if trying to open a partitioned parquet inside metabase, one must declare all fields contained in the files.
For instance, this works:
select REF_BACEN, NU_ORDEM from read_parquet('/dados/dados_parquet/sicor_operacao_basica.parquet/*/*')
but this does not
select * from read_parquet('/dados/dados_parquet/sicor_operacao_basica.parquet/*/*')
I'm reopening the issue because I think that select * from...
should work regardless if we have a single parquet or a partitioned dataset.
In case it helps, I'm attaching the metabase log that I get when I execute the query.
metabase_partitioned_parquet_errorerr.log
Not sure it's related but I'm also seeing this warning when I re-scan the duckdb fields in metabase:
proagrodb_docker-metabase-1 | 2023-09-29 12:13:03,190 WARN query-processor.deprecated :: Atenção: driver :duckdb está usando Honey SQL 1. Este método foi descontinuado em 0.46.0 e será excluído em uma versão futura.
Sorry for the message in portuguese. But it's basically saying that the driver is using Honey SQL1, which was discontinued in version 0.46.0 and will be excluded in the future.
but this does not
select * from read_parquet('/dados/dados_parquet/sicor_operacao_basica.parquet/*/*')
Hi! Did you try this thing direct in duckdb? Is it working?
but this does not
select * from read_parquet('/dados/dados_parquet/sicor_operacao_basica.parquet/*/*')
Hi! Did you try this thing direct in duckdb? Is it working?
Yes, it works using duckdb CLI. I pasted the output a couple of comments above
https://github.com/AlexR2D2/metabase_duckdb_driver/issues/16#issuecomment-1730194178
Hi, could you try this again using the latest version of metabase plugin?
select * from read_parquet('/Users/alex/Documents/Dev/duckdb/cars_part/**/*.parquet', hive_partitioning=true);
or
select * from read_parquet('/Users/alex/Documents/Dev/duckdb/cars_part/**/*.parquet');
Sorry for the long delay in answering. I'm installing metabase and duckdb driver using the following dockerfile
. Will that give me the latest version of the metabase plugin?
FROM openjdk:19-buster
ENV MB_PLUGINS_DIR=/home/plugins/
ADD https://downloads.metabase.com/v0.47.5/metabase.jar /home
ADD https://github.com/AlexR2D2/metabase_duckdb_driver/releases/download/0.2.3/duckdb.metabase-driver.jar /home/plugins/
RUN chmod 744 /home/plugins/duckdb.metabase-driver.jar
CMD ["java", "-jar", "/home/metabase.jar"]
I just tested using a metabase installed as shown above and a partitioned parquet dataset. All worked fine!
The following syntaxes worked:
select * from read_parquet('/data/iris_part/**/*.parquet')
select * from read_parquet('/data/iris_part/**/*')
select * from read_parquet('/data/iris_part/*/*.parquet')
select * from read_parquet('/data/iris_part/*/*')
select * from read_parquet('/data/iris_part/**')
I'm trying out metabase with duckdb for processing viewing a large parquet file. I'm running Metabase from a Docker container following the instructions found on this repo readme.
I was able to query several single parquet files using
However, I can't seam to find out how to do the same for a parquet file that was created with partitions. My parquet file is partitioned into years like so:
This is what I tried:
and
None worked
Is it possible to open partitioned parquet files in duckdb?
Edit: Looking at the duckdb documentation, one should use
parquet_scan
for this. But I'm getting Cannot invoke "Object.getClass()" because "target" is nullhttps://duckdb.org/docs/archive/0.8.1/data/partitioning/hive_partitioning