elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 121 forks source link

`from_parquet` does not handle `FSS.HTTP.Entry` when `lazy: true` #990

Closed aymanosman closed 1 month ago

aymanosman commented 1 month ago
frame = Explorer.DataFrame.from_parquet!("https://huggingface.co/datasets/aqubed/kub_tickets_small/resolve/main/data/train-00000-of-00001-47868532d4f55873.parquet", lazy: true)
** (FunctionClauseError) no function clause matching in Explorer.PolarsBackend.LazyFrame.from_parquet/4

    The following arguments were given to Explorer.PolarsBackend.LazyFrame.from_parquet/4:

        # 1
        %FSS.HTTP.Entry{
          url: "https://huggingface.co/datasets/aqubed/kub_tickets_small/resolve/main/data/train-00000-of-00001-47868532d4f55873.parquet",
          config: %FSS.HTTP.Config{headers: []}
        }

        # 2
        nil

        # 3
        nil

        # 4
        false

    Attempted function clauses (showing 2 out of 2):

        def from_parquet(%FSS.S3.Entry{} = entry, max_rows, columns, _rechunk)
        def from_parquet(%FSS.Local.Entry{} = entry, max_rows, columns, _rechunk)

    (explorer 0.10.0-dev) lib/explorer/polars_backend/lazy_frame.ex:205: Explorer.PolarsBackend.LazyFrame.from_parquet/4
    (explorer 0.10.0-dev) lib/explorer/data_frame.ex:871: Explorer.DataFrame.from_parquet!/2
    iex:4: (file)
billylanchantin commented 1 month ago

Thanks for this report too, @aymanosman!

I don't have the time to dig into it at the moment. Hopefully it's as simple a change as #992.

If you're inclined to dig into it, PRs are welcome :)

ceyhunkerti commented 1 month ago

https://github.com/elixir-explorer/explorer/pull/993 here this one works with the url above needs a review and further guidance if possible