elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.13k stars 123 forks source link

Series.row_index/1 #862

Closed iurimateus closed 9 months ago

iurimateus commented 9 months ago

Interesting. So what if we:

  1. Add a Series.row_index()
  2. Make it raise for the default backend saying it is only supported in dataframes
  3. Implement it properly for expressions

Would that work? Perhaps we try that as a separate PR so we can compare them side by side?

Originally posted by @josevalim in https://github.com/elixir-explorer/explorer/issues/833#issuecomment-1902213418

@josevalim is this what you had in mind? I think it's the first /0 function in Series; I'm also not sure on the implementation.

Should we rename len() to size() for consistency?

cc @cigrainger

josevalim commented 9 months ago

Thank you @iurimateus. Quick question: do we need to use expr_len? couldn't we use size(...) and expect the series as argument for row_index?

billylanchantin commented 9 months ago

@iurimateus I think expr_len is already present, it's just named expr_size:

https://github.com/elixir-explorer/explorer/blob/main/native/explorer/src/expressions.rs#L576-L580

iurimateus commented 9 months ago

For the record, these changes were before we had size.

@iurimateus I think expr_len is already present, it's just named expr_size:

Thanks for pointing it out, I was following py-polars implementation of using dsl::len(), but that doesn't seem necessary (and not really compatible with Explorer's API, on which it doesn't expose 'expressions').

Thank you @iurimateus. Quick question: do we need to use expr_len? couldn't we use size(...) and expect the series as argument for row_index?

That works, I've pushed a new commit.

Thoughts on int_range?

josevalim commented 9 months ago

I would prefer to postpone int_range for now. So let's remove it from the Elixir side but we can still keep it on the Rust side.

We also need to list row_index as a callback here: https://github.com/elixir-explorer/explorer/blob/b901d975c4dd6dc8fe462987afbd497649b958c8/lib/explorer/backend/series.ex#L41

And then provide an implementation for regular series (which will basically return a list of u32).

josevalim commented 9 months ago

Awesome job. I have pushed one last comment and I think we are good to go!

josevalim commented 9 months ago

:green_heart: :blue_heart: :purple_heart: :yellow_heart: :heart: