elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 122 forks source link

`from_parquet` does not handle explicit `FSS.HTTP.Entry` #991

Closed aymanosman closed 1 month ago

aymanosman commented 1 month ago

Trying to pass from_parquet an explicit FSS.HTTP.Entry results in this error.

iex(5)> {:ok, entry} =
...(5)>   FSS.HTTP.parse(
...(5)>     "https://huggingface.co/datasets/aqubed/kub_tickets_small/resolve/main/data/train-00000-of-00001-47868532d4f55873.parquet"
...(5)>   )
{:ok,
 %FSS.HTTP.Entry{
   url: "https://huggingface.co/datasets/aqubed/kub_tickets_small/resolve/main/data/train-00000-of-00001-47868532d4f55873.parquet",
   config: %FSS.HTTP.Config{headers: []}
 }}
iex(6)>
nil
iex(7)> frame = Explorer.DataFrame.from_parquet!(entry)
** (FunctionClauseError) no function clause matching in Explorer.DataFrame.normalise_entry/2

    The following arguments were given to Explorer.DataFrame.normalise_entry/2:

        # 1
        %FSS.HTTP.Entry{
          url: "https://huggingface.co/datasets/aqubed/kub_tickets_small/resolve/main/data/train-00000-of-00001-47868532d4f55873.parquet",
          config: %FSS.HTTP.Config{headers: []}
        }

        # 2
        nil

    Attempted function clauses (showing 8 out of 8):

        defp normalise_entry(%_{} = entry, config) when config != nil
        defp normalise_entry(%FSS.Local.Entry{} = entry, nil)
        defp normalise_entry(%FSS.S3.Entry{config: %FSS.S3.Config{}} = entry, nil)
        defp normalise_entry(<<"s3://", _rest::binary>> = entry, config)
        defp normalise_entry(<<"file://", path::binary>>, _config)
        defp normalise_entry(<<"http://", _rest::binary>> = url, config)
        defp normalise_entry(<<"https://", _rest::binary>> = url, config)
        defp normalise_entry(filepath, _config) when is_binary(filepath)

    (explorer 0.10.0-dev) lib/explorer/data_frame.ex:837: Explorer.DataFrame.normalise_entry/2
    (explorer 0.10.0-dev) lib/explorer/data_frame.ex:825: Explorer.DataFrame.from_parquet/2
    (explorer 0.10.0-dev) lib/explorer/data_frame.ex:871: Explorer.DataFrame.from_parquet!/2
    iex:7: (file)
billylanchantin commented 1 month ago

Thanks for the report, @aymanosman!

I think this might be a very straightforward fix. I'll open up a PR.