elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 123 forks source link

Provide more details in error message around `concat_rows` #1023

Open LostKobrakai opened 1 day ago

LostKobrakai commented 1 day ago

Given the following dataframes:

[
  #Explorer.DataFrame<
    Polars[1134 x 4]
    gtin string […]
    series string […]
    program string […]
    program_color string […]
  >,
  #Explorer.DataFrame<
    Polars[1520 x 4]
    gtin string […]
    series string […]
    program null […]
    series_color string […]
  >
]

I got

[error] ** (ArgumentError) dataframes must have the same columns
    (explorer 0.10.0) lib/explorer/data_frame.ex:5436: anonymous fn/3 in Explorer.DataFrame.compute_changed_types_concat_rows/1

This lead me to believe that the null vs string column type to be the issue while it was the different *_color columns.

The error message could be better and concat_rows docs could call out that typecasting works between null and other column types

billylanchantin commented 1 day ago

Small clarification (we chatted on slack):

The error message could be improved by calling out which columns specifically didn't match. Something like:

** (ArgumentError) dataframes must have the same columns

  * Left DataFrame has these columns not present in the right DataFrame:

      ["program_color"]

  * Right DataFrame has these columns not present in the left DataFrame:

      ["series_color"]

    (explorer 0.10.0) lib/explorer/data_frame.ex:5436: anonymous fn/3 in Explorer.DataFrame.compute_changed_types_concat_rows/1

where internally we'd do something like:

left_cols = left_df |> names() |> MapSet.new()
right_cols = right_df |> names() |> MapSet.new()

mismatched_cols = MapSet.symmetric_difference(left_cols, right_cols)

in_left_only = left_cols |> MapSet.intersection(mismatched_cols) |> Enum.to_list()
in_right_only = right_cols |> MapSet.intersection(mismatched_cols) |> Enum.to_list()