adriangb / pgpq

Stream Arrow data into Postgres
MIT License
248 stars 17 forks source link

Can't seem to make the example code work #18

Closed yagodorea closed 1 year ago

yagodorea commented 1 year ago

I'm kinda new to Rust so this might be something silly, I'm trying to read a parquet file using this example, cargo check is failing with this log:

40  |     let mut encoder = ArrowToPostgresBinaryEncoder::try_new(&schema).unwrap();
    |                       ------------------------------------- ^^^^^^^ expected struct `arrow_schema::schema::Schema`, found a different struct `arrow_schema::schema::Schema`
    |                       |
    |                       arguments to this function are incorrect
    |
    = note: struct `arrow_schema::schema::Schema` and struct `arrow_schema::schema::Schema` have similar names, but are actually distinct types

I already went through some weird typing errors, can't seem to get anywhere 😵

Also, this line:

use arrow_array::RecordBatch;
// ...
let batches = reader.map(|v| v.unwrap()).collect();

produces this problem:

26   |     let batches = reader.map(|v| v.unwrap()).collect();
     |                                              ^^^^^^^ value of type `Vec<arrow_array::RecordBatch>` cannot be built from `std::iter::Iterator<Item=arrow_array::record_batch::RecordBatch>`

But I also can't import arrow_array::record_batch::RecordBatch because it's private. It seems like these problems when you have a transient dependency being defined by multiple times, but I don't have any module besides main defining stuff... My dependencies file:

[dependencies]
pgpq = "0.7.3"
arrow-schema = "^33.0.0"
arrow-array = "^33.0.0"
arrow-ipc = ">=33.0.0"

Already tried a number of dependency combinations but nothing seems to work.

adriangb commented 1 year ago

Sorry for the lack of a reply. I’ll try this out, but honestly I haven’t used this as a pure Rust crate myself. I may need to fudge with the dependencies a bit, the arrow crates tend to re-export things in multiple places so I totally may have picked the wrong one.

adriangb commented 1 year ago

Looks like what was going on here is that in your crate you did cargo add arrow-schema or similar which added say arrow_schema = "^46.0.0. Since that's incompatible with the pin in pgp (as of this issue arrow_schema = "^33.0.0") cargo used two different versions. I opened #20 which removes the upper bound and should let cargo use the same version (unless you pin to a lower version in your crate, so you might need to update your versions of the arrow packages as well) at the risk of not compiling in the future if arrow makes breaking changes that impact this package. If that happens again I'll probably just absorb the changes here and bump up the lower version bound, I don't have the bandwidth to support multiple arrow major versions and/or test them in CI.