chmp / serde_arrow

Convert sequences of Rust objects to Arrow tables
MIT License
60 stars 17 forks source link

Docs: Fix examples in Readme and API docs #167

Closed jchidley closed 4 months ago

jchidley commented 4 months ago

I must be doing something really silly.

Given this Cargo.toml

[package]
name = "arrow_test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
arrow = "51.0.0"
serde = "1.0.200"
serde_arrow = "0.11.2"

and this src\mail.rs

use arrow::datatypes::FieldRef;
use serde_arrow::schema::{SerdeArrowSchema, TracingOptions};

use serde::{Deserialize, Serialize};
fn main() {
    #[derive(Serialize, Deserialize)]
    struct Record {
        a: f32,
        b: i32,
    }

    let records = vec![
        Record { a: 1.0, b: 1 },
        Record { a: 2.0, b: 2 },
        Record { a: 3.0, b: 3 },
    ];

    // Determine Arrow schema
    let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;

    // Build a record batch
    let batch = serde_arrow::to_record_batch(&fields, &records)?;
}

I get this output

   Compiling arrow_test v0.1.0 (C:\Users\jackc\AppData\Local\Temp\arrow_test)
error[E0425]: cannot find function `to_record_batch` in crate `serde_arrow`
   --> src\main.rs:22:30
    |
22  |     let batch = serde_arrow::to_record_batch(&fields, &records)?;
    |                              ^^^^^^^^^^^^^^^ not found in `serde_arrow`
    |
note: found an item that was configured out
   --> C:\Users\jackc\.cargo\registry\src\index.crates.io-6f17d22bba15001f\serde_arrow-0.11.2\src\lib.rs:316:68
    |
316 | ...om_record_batch, to_arrow, to_record_batch, ArrowBuilder};
    |                               ^^^^^^^^^^^^^^^

warning: unused import: `SerdeArrowSchema`
 --> src\main.rs:2:27
  |
2 | use serde_arrow::schema::{SerdeArrowSchema, TracingOptions};
  |                           ^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

error[E0599]: no function or associated item named `from_type` found for struct `Vec<Arc<arrow::datatypes::Field>>` in the current scope
  --> src\main.rs:19:35
   |
19 | ...ields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;
   |                             ^^^^^^^^^ function or associated item not found in `Vec<Arc<Field>>`

Some errors have detailed explanations: E0425, E0599.
For more information about an error, try `rustc --explain E0425`.
warning: `arrow_test` (bin "arrow_test") generated 1 warning
error: could not compile `arrow_test` (bin "arrow_test") due to 2 previous errors; 1 warning emitted
chmp commented 4 months ago

@jchidley I would say the docs are suboptimal. serde_arrow supports arrow only once you add the relevant featues. In your case arrow-51. It should work, if you modify your Cargo.toml to read

[package]
name = "arrow_test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
arrow = "51.0.0"
serde = "1.0.200"

# !!! modified line:
serde_arrow = { version = "0.11.2", features = ["arrow-51"] }

Thanks for making me aware of this issue!

Tasks:

jchidley commented 4 months ago

I have a slightly different output now

warning: unused import: `SerdeArrowSchema`
 --> src\main.rs:2:27
  |
2 | use serde_arrow::schema::{SerdeArrowSchema, TracingOptions};
  |                           ^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

error[E0599]: no function or associated item named `from_type` found for struct `Vec<Arc<arrow::datatypes::Field>>` in the current scope
  --> src\main.rs:19:35
   |
19 | ...ields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;
   |                             ^^^^^^^^^ function or associated item not found in `Vec<Arc<Field>>`
   |
   = help: items from traits can only be used if the trait is in scope
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
   |
1  + use serde_arrow::schema::SchemaLike;
   |

error[E0277]: the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `FromResidual`)
  --> src\main.rs:22:64
   |
5  | fn main() {
   | --------- this function should return `Result` or `Option` to accept `?`
...
22 |     let batch = serde_arrow::to_record_batch(&fields, &records)?;
   |                                                                ^ cannot use the `?` operator in a function that returns `()`
   |
   = help: the trait `FromResidual<Result<Infallible, serde_arrow::Error>>` is not implemented for `()`

Some errors have detailed explanations: E0277, E0599.
For more information about an error, try `rustc --explain E0277`.
warning: `serde_arrow_test` (bin "serde_arrow_test") generated 1 warning
error: could not compile `serde_arrow_test` (bin "serde_arrow_test") due to 2 previous errors; 1 warning emitted

can I get rid of the unused SerdeArrowSchema too?

chmp commented 4 months ago

Damn. The API changed, but the docs did not: you need to include SchemaLike, not SerdeArrowSchema. SchemaLike is a trait that adds the from_type helper function. I.e., replace use serde_arrow::schema::{SerdeArrowSchema, TracingOptions}; with use serde_arrow::schema::{SchemaLike, TracingOptions};

jchidley commented 4 months ago

Also, I need the main function to return some kind of result (possibly with and error)? which I can't figure out.

chmp commented 4 months ago

Oh. That's independent from serde_arrow and a general Rust issue. If you want to get started quickly, anyhow offers an easy to use error / result type that handles most cases (if you are building a library it's recommended to roll your own error type, e.g., via thiserror).

edit: you could also use serde_arrow::Result, but that only handles a subset of errors. Most likely you are also using other libraries in your code. In that case I would recommend to go directly with anyhow.

jchidley commented 4 months ago

I now have this minimal working example:

use arrow::datatypes::FieldRef;
use serde_arrow::schema::{SchemaLike, TracingOptions};
use serde_arrow::Result;

use serde::{Deserialize, Serialize};
fn main() -> Result<()> {
    #[derive(Serialize, Deserialize)]
    struct Record {
        a: f32,
        b: i32,
    }

    let records = vec![
        Record { a: 1.0, b: 1 },
        Record { a: 2.0, b: 2 },
        Record { a: 3.0, b: 3 },
    ];

    // Determine Arrow schema
    let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default())?;

    // Build a record batch
    let batch = serde_arrow::to_record_batch(&fields, &records)?;

    println!("{:?}", batch);

    Ok(())
}

output:

RecordBatch { schema: Schema { fields: [Field { name: "a", data_type: Float32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "b", data_type: Int32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, columns: [PrimitiveArray<Float32>
[
  1.0,
  2.0,
  3.0,
], PrimitiveArray<Int32>
[
  1,
  2,
  3,
]], row_count: 3 }
jchidley commented 4 months ago

Slightly cleaner and shorter example

use arrow::datatypes::FieldRef;
use serde::{Deserialize, Serialize};
use serde_arrow::schema::{SchemaLike, TracingOptions};

fn main() {
    #[derive(Serialize, Deserialize)]
    struct Record {
        a: f32,
        b: i32,
    }

    let records = vec![
        Record { a: 1.0, b: 1 },
        Record { a: 2.0, b: 2 },
        Record { a: 3.0, b: 3 },
    ];

    // Determine Arrow schema
    let fields = Vec::<FieldRef>::from_type::<Record>(TracingOptions::default()).unwrap();

    // Build a record batch
    let batch = serde_arrow::to_record_batch(&fields, &records).unwrap();

    println!("{:?}", batch);
}
chmp commented 4 months ago

Nice. Thanks for the update. I hope everythings works for you. If you run into any other issues (or the docs are unclear), feel free to leave a note :)

chmp commented 4 months ago

Updated the docs on main