elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 121 forks source link

NIF panic with list of struct #1011

Open maennchen opened 1 week ago

maennchen commented 1 week ago

Code

Mix.install([{:explorer, "~> 0.10.0"}])

name_dtype = {"names",
{:list,
 {:struct,
  [
    {"language", :string},
    {"name", :string},
    {"transliteration", :category},
    {"type", :category}
  ]}}}

[
  %{names: []},
  %{names: [%{name: "CABK", type: "acronym", language: nil, transliteration: "none"}]}
]
|> Explorer.DataFrame.new(dtypes: [name_dtype])
|> dbg

Expected

A working Dataframe

Actual

thread '<unnamed>' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-arrow-0.43.1/src/array/binview/mod.rs:327:9:
assertion failed: i < self.len()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-arrow-0.43.1/src/array/binview/mod.rs:327:9:
assertion failed: i < self.len()
#Inspect.Error<
  got ErlangError with message:

      """
      Erlang error: :nif_panicked
      """

  while inspecting:

      %{
        data: %Explorer.PolarsBackend.DataFrame{
          resource: #Reference<0.3840168779.3188326401.44015>
        },
        remote: nil,
        names: ["names"],
        __struct__: Explorer.DataFrame,
        dtypes: %{
          "names" => {:list,
           {:struct,
            [
              {"language", :string},
              {"name", :string},
              {"transliteration", :category},
              {"type", :category}
            ]}}
        },
        groups: []
      }

  Stacktrace:

    (explorer 0.10.0) Explorer.PolarsBackend.Native.s_to_list(#Explorer.PolarsBackend.Series<
  #Reference<0.3840168779.3188326416.41949>
>)
    (explorer 0.10.0) lib/explorer/polars_backend/shared.ex:24: Explorer.PolarsBackend.Shared.apply_series/3
    (explorer 0.10.0) lib/explorer/backend/data_frame.ex:324: anonymous fn/3 in Explorer.Backend.DataFrame.build_cols_algebra/3
    (elixir 1.17.3) lib/enum.ex:1703: Enum."-map/2-lists^map/1-1-"/2
    (explorer 0.10.0) lib/explorer/backend/data_frame.ex:283: Explorer.Backend.DataFrame.inspect/5
    (explorer 0.10.0) lib/explorer/data_frame.ex:6308: Inspect.Explorer.DataFrame.inspect/2
    (elixir 1.17.3) lib/inspect/algebra.ex:347: Inspect.Algebra.to_doc/2
    (elixir 1.17.3) lib/io.ex:481: IO.inspect/3

>

Note: This error happens while inspecting the result. If I however pass it on to Explorer.DataFrame.dump_ndjson/1, I also get a segmentation fault.

[1]    79707 segmentation fault (core dumped)  elixir bug.exs

Context

In production, the error looks slightly different, but I'm unable to reproduce the exact error without providing the confidential information contained. I can however provide the error without stacktrace:

thread '<unnamed>' panicked at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-arrow-0.43.1/src/array/binview/mod.rs:327:9:
                                                                                                                                assertion failed: i < self.len()
[info] Sent 500 in 9ms
[error] ** (ErlangError) Erlang error: :nif_panicked
    (explorer 0.10.0) Explorer.PolarsBackend.Native.s_to_list(#Explorer.PolarsBackend.Series<
  shape: (527,)
  Series: 'names' [list[struct[4]]]
  [
        null
        null
        null
        []
        []
        …
        null
        null
        null
        null
        null
  ]
>)

Since the part with assertion failed: i < self.len() is the same, I think the provided reproduction should represent the error sufficiently.

billylanchantin commented 1 week ago

Hi @maennchen! We appreciate you putting our struct dtype logic through its paces :P I'll take a look at this today or tomorrow.

Related, I want to add a property test like this one but for DataFrame.new:

https://github.com/elixir-explorer/explorer/blob/main/test/explorer/series/inferred_dtype_property_test.exs

The generators would be similar, but the goal would be to see if we get a panic. Hopefully such a test would catch more bugs like these up front.