PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.79k stars 212 forks source link

Language changes out of new resolver? #4753

Open max-sixty opened 1 month ago

max-sixty commented 1 month ago

What's up?

Is it worth discussing the language changes that we'll need to make from the new resolver?

@aljazerzen briefly spoke on a call a few weeks back, possibly with an idea of requiring a prefix on local variables. I've also had thoughts about requiring some data references to have a prefix of ., similar to jq, as part of establishing a more formal notion of scope.

To the extent those are going to be big changes, is it worth discussing them before implementing things in detail?[^1]

[^1]: OTOH I'm also hesitant about saying "we have to agree on everything before anyone starts writing code", given our main constraint is writing our complicated code rather than having discussions. But for big changes I do think it's worth covering the tradeoffs...

richb-hanover commented 1 month ago

Is it worth discussing the language changes that we'll need to make from the new resolver?

My two cents... Yes.

The strongest argument (for me) is that PRQL is an uncluttered language. A person like me (who has only the slightest understanding of how SQL works) can create a hundred-line PRQL query (that compiles to hundreds of lines of SQL) with a fair degree of confidence that it'll work as I expect.

When I see proposals for new syntax/semantic features, I worry that PRQL could evolve from a "learn it in an hour; use it for a lifetime" tool into a "Computer Language", that can only be used by a "Computer Person".

On the other hand, I have personally stumbled over the problems of "relation aliases" mentioned in #4751. I still owe the group a description of my thinking on that issue , which I'll get to soon. But, yes - let's talk about the language changes that might be necessary. Thanks for listening.

aljazerzen commented 1 month ago

Sure, I can list them here, but I don't think they are worth discussing.

I'm not advocating for these changes to be merged into the language, they were just in the way of me making progress. After my current work is done, maybe I can re-implement workarounds so there changes are not necessary.

  1. references to variable definitions require fully-qualified paths, that must start with one of:

    • project (reference to the root of the module tree, analogous to crate in Rust),
    • module (reference to current module),
    • db (for accessing project.db, the default database modules),
    • std (for accessing project.std

    Example:

    let a = 5
    
    from db.x
    select {module.a}

    Any identifier without these prefixes is treated as if it had this. prefix.

  2. table names don't make it into tuples (when instantiating a relation, its name is not injected into its type)

    from employees
    select {employees.name}
           ^^^^^^^^^ Error: unknown name `employees`
    
    Hint: available names are `employee_id`, `name`, `age`.
  3. forbid references to fields of current tuple (when declaring tuples, field expression cannot refer to previous fields)

    from x
    select {a, b}
    select {
      c = 1,
      d = c + 1 # Error: unknown name `c`. Available names: `a`, `b`
    }
  4. x.a will infer name a only

    from employees    # type: [{name = ...}]
    select {e = this} # type: [{e = {name = ...}}]
    select {e.name}   # type: [{name = ...}]
    select {e.name}   # error: unknown name `e`
max-sixty commented 1 month ago

I'm not advocating for these changes to be merged into the language, they were just in the way of me making progress. After my current work is done, maybe I can re-implement workarounds so there changes are not necessary.

OK! Definitely no need to discuss if not yet helpful + no irreversible decisions.

Thanks for outlining them all the same!