elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 123 forks source link

Implement Property Tests for `DataFrame.new` #1012

Closed maennchen closed 1 week ago

maennchen commented 3 weeks ago

Follow up from https://github.com/elixir-explorer/explorer/issues/1011#issuecomment-2444709547

Property Tests for DataFrame.new seem to be helpful. I already found a panic with it and lots of other issues.

This PR is not finished / ready to merge. I just wanted to give it a go.

Feel free to:

Example uncovered issue:

mix test --exclude test --include property              
Running ExUnit with seed: 393713, max_cases: 16
Excluding tags: [:test, :cloud_integration]
Including tags: [:property]

thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))
thread '<unnamed>' panicked at /home/vscode/.asdf/installs/rust/1.82.0/registry/src/index.crates.io-6f17d22bba15001f/polars-core-0.43.1/src/named_from.rs:170:56:
called `Result::unwrap()` on an `Err` value: SchemaMismatch(ErrString("cannot unpack series, data types don't match"))

  1) property should be able to create a DataFrame from any valid data / dtype combination (Explorer.DataFrameTest)
     test/explorer/data_frame_test.exs:4714
     ** (ExUnitProperties.Error) failed with generated values (after 8 successful runs):

         * Clause:    dtype <- Explorer.Generator.dtype()
           Generated: {:list, {:list, {:decimal, 2, 1}}}

         * Clause:    data <- list_of(fixed_map(%{"field" => Explorer.Generator.data_for_dtype(dtype)}))
           Generated: [%{"field" => [[Decimal.new("50.2")], nil, [nil, Decimal.new("73.1"), Decimal.new("14.4"), nil, Decimal.new("43.3")], nil, nil, nil, [nil, nil, nil], [Decimal.new("9.4"), nil, Decimal.new("72.5"), nil, Decimal.new("94.8"), nil], [Decimal.new("65.7"), Decimal.new("52.3"), nil, Decimal.new("41.1"), Decimal.new("3.6")]]}]

     got exception:

         ** (ArgumentError) cannot create series "field": Erlang error: :nif_panicked
     code: check all(
     stacktrace:
       (explorer 0.11.0-dev) lib/explorer/polars_backend/data_frame.ex:578: Explorer.PolarsBackend.DataFrame.series_from_list!/3
       (explorer 0.11.0-dev) lib/explorer/polars_backend/data_frame.ex:529: anonymous fn/3 in Explorer.PolarsBackend.DataFrame.from_tabular/2
       (elixir 1.17.3) lib/enum.ex:1703: Enum."-map/2-lists^map/1-1-"/2
       (explorer 0.11.0-dev) lib/explorer/polars_backend/data_frame.ex:523: Explorer.PolarsBackend.DataFrame.from_tabular/2
       test/explorer/data_frame_test.exs:4721: anonymous fn/3 in Explorer.DataFrameTest."property should be able to create a DataFrame from any valid data / dtype combination"/1
       (stream_data 1.1.1) lib/stream_data.ex:2367: StreamData.shrink_failure/6
       (stream_data 1.1.1) lib/stream_data.ex:2327: StreamData.check_all/7
       test/explorer/data_frame_test.exs:4715: (test)

.
Finished in 7.7 seconds (7.7s async, 0.00s sync)
617 doctests, 2 properties, 1524 tests, 1 failure, 2141 excluded
maennchen commented 3 weeks ago

@billylanchantin Thanks for the feedback. All should be fixed / applied.

I have marked this PR as draft since I have no intention of fixing the bugs it uncovers and also not to define more meaningful tests based on it. (somebody more familiar with the internals will do a lot better job at that)

I'll however happily get the generator ready so that someone else can use it to root out all the bugs.

billylanchantin commented 3 weeks ago

@maennchen The plan of keeping this as a draft for now makes sense to me.

Related: above, you said we (the team) could take over the PR and start making tests based on what we find. Would you mind if I do that? I've made some tweaks to the generator locally and I've already found out a few things.

Findings so far:

Mind if I push my changes?

maennchen commented 3 weeks ago

@billylanchantin Feel free to take over 😊

We should probably also property test the data export formats. #1011 manifests for me both when inspecting as well as when storing as a json or parquet file.

josevalim commented 1 week ago

Btw, I think we should have property based testing disabled altogether by default, and we enable it on CI only. There is no need to spend CPU cycles on properties for most of the tests runs that the core team and contributors do. They are very valuable though on CI.

billylanchantin commented 1 week ago

@josevalim I agree. I actually thought it was already like that but it's not working. Will fix.

billylanchantin commented 1 week ago

@josevalim All is good. Turns out we were excluding property tests outside of CI all along.

Related: TIL that if you use the file:line syntax when running tests, your other --include/--exclude flags are ignored...