PRQL / prql-query

Query and transform data with PRQL
Apache License 2.0
126 stars 7 forks source link

prql-query 0.0.14 doesn't build with duckdb 0.6.0 #22

Open richb-hanover opened 2 years ago

richb-hanover commented 2 years ago

duckdb just released version 0.6.0 that purports to fix #18, the "... got something else..." error message.

To reproduce, update the Cargo.toml to use duckdb version=0.6.0, then cargo build. Here are the build messages:

?1 prql-query % cargo build
    Updating crates.io index
  Downloaded arrow-schema v26.0.0
  Downloaded arrow-array v26.0.0
  Downloaded arrow-select v26.0.0
  Downloaded arrow-buffer v26.0.0
  Downloaded arrow-data v26.0.0
  Downloaded bindgen v0.61.0
  Downloaded duckdb v0.6.0
  Downloaded arrow v26.0.0
  Downloaded libduckdb-sys v0.6.0
  Downloaded 9 crates (3.3 MB) in 1.23s (largest was `libduckdb-sys` at 2.5 MB)
   Compiling bindgen v0.61.0
   Compiling arrow-schema v26.0.0
   Compiling arrow-buffer v26.0.0
   Compiling arrow v23.0.0
   Compiling arrow-data v26.0.0
   Compiling arrow-array v26.0.0
   Compiling arrow-select v26.0.0
   Compiling arrow v26.0.0
   Compiling libduckdb-sys v0.6.0
   Compiling parquet v23.0.0
   Compiling duckdb v0.6.0
   Compiling prql-query v0.0.14 (/Users/richb/github/prql-query)
error[E0277]: a value of type `Vec<arrow::record_batch::RecordBatch>` cannot be built from an iterator over elements of type `duckdb::arrow::record_batch::RecordBatch`
    --> src/backends/duckdb.rs:133:37
     |
133  |     let rbs = stmt.query_arrow([])?.collect::<Vec<RecordBatch>>();
     |                                     ^^^^^^^ value of type `Vec<arrow::record_batch::RecordBatch>` cannot be built from `std::iter::Iterator<Item=duckdb::arrow::record_batch::RecordBatch>`
     |
     = help: the trait `FromIterator<duckdb::arrow::record_batch::RecordBatch>` is not implemented for `Vec<arrow::record_batch::RecordBatch>`
     = help: the trait `FromIterator<T>` is implemented for `Vec<T>`
note: required by a bound in `std::iter::Iterator::collect`
    --> /Users/richb/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/iter/traits/iterator.rs:1832:19
     |
1832 |     fn collect<B: FromIterator<Self::Item>>(self) -> B
     |                   ^^^^^^^^^^^^^^^^^^^^^^^^ required by this bound in `std::iter::Iterator::collect`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `prql-query` due to previous error
?101 prql-query %
snth commented 2 years ago

Hi, thanks for the report.

Haven't got to do the version bump myself yet. You'll usually have to bump the arrow and parquet versions to the same number that duckdb-rs uses. Looking at the docs.rs/duckdb site it looks like it references v26.

So in your Cargo.toml, bump the arrow and parquet versions from 23 to 26 and see if that fixes it.

I'll look to push out another release soon.

richb-hanover commented 2 years ago

I can confirm that changing arrow and parquet to "26.0.0" makes both cargo build and the Dockerfile work fine. (And I get the improved error message from duckdb / sqlite_scanner... See https://github.com/duckdblabs/sqlite_scanner/issues/15#issuecomment-1316982990)

Question: Is setting those versions to "26.0.0" overspecifying? Should I simply set them to "26"? Thanks.

snth commented 2 years ago

I think you're right. Just 26 should be fine.

duckdb-rs actually made a change which means that we can probably get rid of the arrow dependency once I do some refactoring. Unfortunately we're stuck with the parquet one for now.

richb-hanover commented 2 years ago

Thanks - I have done it in my own repo. I don't think I need to send a PR.

earlev4 commented 1 year ago

Hi. First of all, thanks so much for the excellent project! Huge fan of PRQL! Greatly appreciate all the work and effort that goes into the project.

Recently, I noticed that prql-query is using duckdb "0.5.1" (libduckdb-sys v0.5.1) as a dependency. Will the next version of prql-query include an updated version of duckdb?

Thanks again!

snth commented 1 year ago

Gosh, I can't believe how time has flown and how long this issue has been open. My apologies for not having addressed this yet. The next version of prql-query should certainly include an updated version of DuckDB, the latest being 0.7.1 as of this writing.

In the meantime, if you can use docker rather than a native binary, you could try the following project which I threw together this weekend on the back of the announcement of the duckdb-prql extension.

https://github.com/duckerlabs/ducker

Please let me know if that works for you.

I will try to update prql-query soon as there is an exciting development in the duckdb-rs repo for creating a duckdb binary with the extensions statically compiled in which I would like to incorporate into prql-query as well.

earlev4 commented 1 year ago

Thanks so much, @snth! I will check out the Ducker! Looks like an interesting project. Also, looking forward to the prql-query update! Thanks for the prompt response and all your efforts on PRQL! It is an awesome project and I appreciate it! 😄

richb-hanover commented 1 year ago

The Ducker container looks terrific, and I'll check it out. Unless there's something else in this issue, I think you can close it. Thanks!

eitsupi commented 1 year ago

Wondering if it is possible to remove duckdb-rs from prql-query in favor of the DuckDB's PRQL extension. DuckDB is great, but is it an impediment to building prql-query because of the machine power and time required to build it compared to datafusion here?