elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.13k stars 123 forks source link

[Proof of concept] Expr serialize #916

Closed billylanchantin closed 5 months ago

billylanchantin commented 5 months ago

🚧 Do not merge

This PR is only a proof of concept.

Description

Adds expr_to_json and expr_from_json.

The idea is to provide an escape hatch for Polars functionality we don't yet support. Included is a test case demonstrating that it's possible.

Discussion

Explorer can lag behind what's available in Polars. This functionality would allow users to opt into behavior not otherwise available. My hope is that we can turn this idea into something like Ecto.Query.API.fragment/1.

This approach is messy for a few reasons:

But for that price, we instantly open up all expression-based operations. Please let me know if you think!

josevalim commented 5 months ago

I am not sure we should go down this route because this is serializing an internal representation, right? So there is no guarantee that, once a new Polars version lands, the JSON representation (keys and values) will remain relevant? I think relying on their SQL API should be slightly more stable, because that's a public API?

billylanchantin commented 5 months ago

So there is no guarantee that, once a new Polars version lands, the JSON representation (keys and values) will remain relevant?

The API is public:

And they don't say you can't rely on it. But I think you're likely correct regardless. Looking at the example JSON from my test, it seems they're including a lot of what looks like internal stuff.

In theory if a particular representation breaks, a user could just get the new JSON from the new version of Polars. I'd be ok with that since this is supposed to be a backdoor. But I take your point!

I think relying on their SQL API should be slightly more stable, because that's a public API?

That's true. However their docs say:

As the DataFrame interface is primary, new features are typically added to the expression API first.

So I'm not sure exposing the SQL API would also expose the expressions we want, especially since:

There is no separate SQL engine because Polars translates SQL queries into expressions, which are then executed using its own engine.

But we should definitely pursue the SQL API regardless. How cool would it be for Explorer.DataFrame to implement Ecto.Queryable?

Still, I don't think the SQL API gets us the expressiveness I'm going for. But not exposing an API because it's internal/unreliable is potentially a deal breaker, so great feedback.

josevalim commented 5 months ago

The API is public but I am not sure if its output are guaranteed to be stable across versions. For example, it could be used to serialize expressions across nodes for the same Polars versions, but not guarantees across different ones.

If Polars had an API for building an expression from a SQL fragment, that would be ideal.

billylanchantin commented 5 months ago

If Polars had an API for building an expression from a SQL fragment, that would be ideal.

Seems like they do!

josevalim commented 5 months ago

@billylanchantin YES!!!

billylanchantin commented 5 months ago

@josevalim Ok I can close this and start working on exposing a SQL-to-exprs pipeline :)

FYI you made a good point here: https://github.com/elixir-explorer/explorer/issues/818#issuecomment-1953652209

But maybe viewing the SQL as somewhat of an escape hatch changes things a bit?

josevalim commented 5 months ago

But maybe viewing the SQL as somewhat of an escape hatch changes things a bit?

Exactly. To me, the main question is if we can provide a consistent behaviour across backends. For example, if we have a Postgres backend, would this feature also make sense there? And I think it would, so let's go for it.

billylanchantin commented 5 months ago

Closing in favor of: https://github.com/elixir-explorer/explorer/pull/918