PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.98k stars 218 forks source link

Modules containing data (tables, relations) #3997

Open eitsupi opened 11 months ago

eitsupi commented 11 months ago

Sharing data is often cumbersome when creating reproducible examples.

R has some built-in data, so we can access the data immediately by typing in the name of the table (data.frame). For example, we can immediately check the operation of the head() function by typing the following in webR REPL https://webr.r-wasm.org/latest/.

mtcars |> head()

There is a capacity issue, but why not have a module as standard that contains a useful table to illustrate typical operations?


As an aside, I don't think mtcars is a very typical dataset because it contains only numeric types and no missing values. However, I often use it because it is short and the name is easy to remember. I think the palmerpenguins are commonly used datasets these days, but they may be a bit large. https://allisonhorst.github.io/palmerpenguins/

max-sixty commented 11 months ago

I like the idea a lot!

I guess the examples would have to be very small (like just a few dozen rows), since we would have to create each row with a SELECT statement.

But I think that's still quite useful for examples / bug reports / a common base...

aljazerzen commented 10 months ago

This could be implemented as an external package (see #2491), that would be downloaded by cargo/npm-like-tool, cached and made available within the current project.

This way, we'd avoid including this data in the released compiler binary, but would also have it easily available.

max-sixty commented 10 months ago

This could be implemented as an external package (see #2491), that would be downloaded by cargo/npm-like-tool,

That could be cool, though it's also a decent lift. I would probably vote to push this out until we know more about what packages will look like...