elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 123 forks source link

bug report: csv load option causing column reading out of order #971

Closed lei0zhou closed 2 months ago

lei0zhou commented 2 months ago

reproducible example, with explorer 0.9.1

text = """
first_name , last_name , dob
Alice      , Ant       , 01/02/1970
Billy      , Bat       , 03/04/1990
"""
Explorer.DataFrame.load_csv(text)

{:ok,

Explorer.DataFrame<

Polars[2 x 3] first_name string ["Alice ", "Billy "] last_name string [" Ant ", " Bat "] dob string [" 01/02/1970", " 03/04/1990"]}

worked as expected, but with :dtype option

types = [
  {"first_name", :string},
  {"last_name", :string},
  {"dob", :string}
]

Explorer.DataFrame.load_csv(text, dtypes: types)

{:ok,

Explorer.DataFrame<

Polars[2 x 3] dob string ["Alice ", "Billy "] first_name string [" Ant ", " Bat "] last_name string [" 01/02/1970", " 03/04/1990"]}

the columns are out of order with options

curious why this happens if this test passed

josevalim commented 2 months ago

I will improve the docs but the dtypes names are meant to match the column names in the CSV. In your case, because the column names have spaces around them, the first column is actually "first_name ". Either pass the :columns option to given them proper names or give it a proper CSV. Since they mismatch, Polars actually try to assign the order of the fields, which I will improve the docs here too.

lei0zhou commented 2 months ago

thanks! after removing blank spaces, the loading works as expected!