Open LostKobrakai opened 1 day ago
Small clarification (we chatted on slack):
The error message could be improved by calling out which columns specifically didn't match. Something like:
** (ArgumentError) dataframes must have the same columns
* Left DataFrame has these columns not present in the right DataFrame:
["program_color"]
* Right DataFrame has these columns not present in the left DataFrame:
["series_color"]
(explorer 0.10.0) lib/explorer/data_frame.ex:5436: anonymous fn/3 in Explorer.DataFrame.compute_changed_types_concat_rows/1
where internally we'd do something like:
left_cols = left_df |> names() |> MapSet.new()
right_cols = right_df |> names() |> MapSet.new()
mismatched_cols = MapSet.symmetric_difference(left_cols, right_cols)
in_left_only = left_cols |> MapSet.intersection(mismatched_cols) |> Enum.to_list()
in_right_only = right_cols |> MapSet.intersection(mismatched_cols) |> Enum.to_list()
Given the following dataframes:
I got
This lead me to believe that the null vs string column type to be the issue while it was the different *_color columns.
The error message could be better and
concat_rows
docs could call out that typecasting works between null and other column types