PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.98k stars 218 forks source link

Cool things in Malloy #71

Open max-sixty opened 2 years ago

max-sixty commented 2 years ago

I've been spending some time looking at Malloy, a language from Looker that looks promising. I've been a fan of @lloydtabb's for a while, and briefly chatted to him about Malloy a couple of weeks ago.

As ever with these "better SQL" languages — our fates are positively correlated. So I hope it's collegial to explore the things Malloy does particularly well at.

Without a full review of the differences — Malloy seems more focused on a broader vision of data modeling with structure on top of existing tables — like explores — blocks that combines tables — and dimensions & measures (these are things I always liked in Looker). The language integrates an approach for defining charts. There's less emphasis on pipelines — ordering isn't semantically important — though there is a way to pipe queries into other queries. It's more typing to get started.

Some things that look compelling at first glance:

That's all for now. I think it'd be good to keep in touch with the Malloy team and would welcome any cross-pollination.

max-sixty commented 2 years ago

Malloy have continued to add some great features, check out https://looker-open-source.github.io/malloy/documentation/language/expressions.html

brycefisher commented 2 years ago

@max-sixty I've spent a lot of time writing Kusto queries for Azure Data Explorer, and love the ability to tack on | chart .... to get a visualization in supported gui clients. I also love that the visualization information is kept separate as metadata returned from Azure Data Explorer servers, so each client can do their own rendering. This kind of eliminates the need for some amount of Jupyter notebooks because you can easily visualize results inline with your query using standardized syntax.

Are you consider some kind of results visualization for PRQL or is that outside the scope of this project, given the architecture?

max-sixty commented 2 years ago

Thanks for the comment @brycefisher ! I agree that's really useful for some workflows. As well as Malloy, Juttle does this well.

I'm definitely open to including something like this. The ideal structure at the moment might be to have some way of extending PRQL, and let people build on top of it — maybe with importable functions, or a view transform that was fairly unopinionated within PRQL.

To set expectations — at the moment I don't have a clear vision for what something good would look like, so it probably won't come from me without lots of people saying that PRQL's success is dependent on it — but that doesn't mean I think it's a bad idea!

How do you think about this?

brycefisher commented 2 years ago

Agreed! I wouldn't take one random persons request as an automatically good roadmap item.

I think time series analysis and easy ways of manipulating time in particular benefit from easy visualization. It probably makes sense to consider visualization related features as you mature time functions and time series.

max-sixty commented 2 years ago

Check this out: https://docs.google.com/presentation/d/18KUl_rrz2K-hbsiKJYS3rtTcYxZMXKklyPllLmTtIYY/mobilepresent#slide=id.g1269816dcbe_0_179

Some really cool stuff — it's has some excellent features around exploring data — in particular:

I'm admittedly much more of a fan of the pipeline concept from a language POV — here I get confused as to whether the order of the clauses matters. Indeed, I'm not sure it could be explained simply, and e.g. adding a filter after an aggregate doesn't seem possible (because it doesn't yet support HAVING as a different language item!).

But the browsing & exploring here is really great.

aljazerzen commented 2 years ago

That is really great for data exploration.

It would be awesome if we could join forces and create a language agnostic tool for such exploration. Relational/dataframe model has been around for years and we are just inventing new ways for querying it, so I don't see why we couldn't reuse the UI that works with relational models and uses any of the 3 query languages.

max-sixty commented 2 years ago

Very much agree @aljazerzen .

I don't want to fall into the mindset of looking at different ideas and saying "nah PRQL better" — but in this case, the benefit in constructing the query arguably come from arbitrarily adding transforms — and this is the thing that PRQL is great at given its pipeline, and Malloy is less strong at. e.g. the having example above. (Though SQL is terrible at, because adding an arbitrary transform can require a new CTE and query restructure.)

My vision for PRQL is that we would probably not build these sorts of things ourselves, but because we're an open and compatible language, we'd be the best tool for others building these sorts of interfaces...

The thing that Malloy starts on that PRQL doesn't is a semantic model for data — how the information maps to reality — e.g. what's a Dimension and what's a Measure, how is data related, how are metrics defined, etc. And that allows some cool features that contribute to Composer's strengths, like indexing dimensions and easier joins.