PRQL / prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement
https://prql-lang.org
Apache License 2.0
9.97k stars 218 forks source link

Consider to allow alternative vocabulary to left/right terms #718

Closed psychoslave closed 2 years ago

psychoslave commented 2 years ago

Following a comment on HN, I seize the opportunity to suggest more formally to allow alternative to the spatial analogy that lead to side:left/right.

Indeed, this spatial vocabulary certainly is inspired by the geometric representation of Venn diagrams.

But this is not the sole perspective under which one might consider these relationships. Grammar and logic especially have also nice terms to designate these kind of syntactic rapports. Here are some example of alternative triplet inspired by these domains:

As you can notice, most of these proposal come with qualifiers varying only by the prefix plugged on their common stem. It’s possible to extend these morphological series to take into account additional cases, for example inter-, pan- or omni- could be used to produce alternative terms to full join.

Coming from yet an other perspective, motivated by terseness of used terms, it’s also possible to use combinations such as tie: arm/leg. Indeed from relationship you easily come to tie, and then arm/leg for anterior/posterior seems pretty straight forward and analogous to "antecedent/(consequent|postcedent|succedent)".

Related resources:

max-sixty commented 2 years ago

Currently our Joins are quite basic — we don't really innovate from SQL. We're very open to doing more here.

Is this specific proposal to change the name of the terms? Or something broader?

While I don't think left / right are ideal, they are somewhat easy to remember once someone has worked them out, and they're very familiar. Do you think these are familiar and easy to remember for most people?

We had thought about allow_nulls:right for a left join, as an example, but decided they were too unfamiliar relative to their benefit.

psychoslave commented 2 years ago

Is this specific proposal to change the name of the terms? Or something broader?

Well, change these terms would be more radical than what I had in mind: I had only in mind to add synonymous, so one might still use side:left/right, but at user own preference, other options would be great.

Since you talk about "something broader", an alternative way to deal with this kind of topic more generally would be to allow users to define aliases, including for hardcoded terms – I guess there are reserved words in PRQL like in most programming languages.

Do you think these are familiar and easy to remember for most people?

Yes, if by most people you mean among people already familiar with English and aware of what a Venn diagram is or otherwise proficient in SQL. An other advantage of a full-fledged alias system would be to allow flexible full internationalization of code base.

We had thought about allow_nulls:right for a left join, as an example, but decided they were too unfamiliar relative to their benefit.

I think you don't need to limit the language to a single closed terminology.

max-sixty commented 2 years ago

Having multiple aliases has a downside that queries become less transferable.

Without having a strong POV on which terms are a priori better — PRQL has a limited "novelty budget", so it's probably not the best place to introduce these terms to query languages, given the likely level of familiarity with the average user. If we could show their success in another language, even a more academic one than PRQL, that would be more compelling. Or if we can support the claim that the terms are familiar in another way.

That said, let's keep an open mind — there's lots we can do around joins — and I do appreciate the mathematics roots of joins, even if we don't end up using these specific terms.

Thank you @psychoslave

psychoslave commented 2 years ago

Having multiple aliases has a downside that queries become less transferable.

I can imagine several very different things that you might mean with "transferable", would you kind enough to precise me what you had in mind please?

I agree that having multiple aliases can also come with its downside. Actually any state of affair will have its downside. 😀

Now, taking further the idea of providing user facilities to alias things as they wish, there is no obligation to provide any battery at all with the toy. And even providing them, that can be as modules that are optionally loadable, simply setting aliases.

That said, let's keep an open mind — there's lots we can do around joins — and I do appreciate the mathematics roots of joins, even if we don't end up using these specific terms.

On that side, some terminology from the Cartesian/relational product would serve you better, hmm maybe multiplicand/multiplier?.

psychoslave commented 2 years ago

Also, regarding a thorough alias facility, this would not be a total novelty. For example, in ZSH, one can easily alias almost anything:

alias -g wherefrom="|"
alias -g eft="&&"
echo love wherefrom read truth eft echo $truth

Maybe this can address the concerns regarding query transferability: it’s all in the hand of the end user to use features wisely. 🧙‍♀️

aljazerzen commented 2 years ago

Sorry, but I don't really understand how would arm/leg and anterior/posterior be connected to relational joins?

Effectively, this argument is specifying what to do with rows that don't match any row from the other table.

How I see it, join does a cartesian product of the two tables. Then it throws out all pairs that don't match the join condition. If all pairs of some row are rejected, it may produce a pair with one side null, but only if this argument we talk about allows it.

side nulls_left nulls_right
inner false false
left false true
right true false
outer true true

When thinking about it this way, it made more sense to me to have the nulls_left and nulls_right compared to side. But it's quite verbose and it would be hard to shorten. One alternative would be allow_nulls:neither/right/left/both as @max-sixty wrote above.


Regarding the aliases I unfortunately agree with @max-sixty that this would be an unnecessary spending of the "nobility" budget. If one would really want to have different naming, they could (in the future) define:

func my_join tie:arm condition -> join (map_tie tie) condition

... where map_tie is currently unexisting (and unsupported type of) function

snth commented 2 years ago

One proposal for the terminology around joins that someone suggested on HN was to use optional, so you would have optional:left or optional:right. These would be analogous to allow_nulls:left and allow_nulls:right. I think that makes sense but one problem that I see with it is that the left/right terms are used opposite to normal SQL. Maybe that doesn't matter, and one could argue perhaps this way would be clearer?

Similarly, are the terms left/right the best dichotomy here? In traditional SQL it kinda makes sense (say you write it out in a single line then the expressions appear on the LEFT and RIGHT) but we have pipelines as our organising principle and have laid these out in a top to bottom fashion by giving special meaning to \n so perhaps we should consider including that? top/bottom don't quite sit right with me and I'm not sure how you would naturally refer to the previous and current steps in a pipeline? source/target? I feel like we still need to ponder more on this and maybe something will come.


Regarding the aliases, I agree that this would massively reduce the transferability of PRQL, by which I mean the ability of someone familiar with PRQL to read other PRQL queries without first familiarising themselves in a custom terminology. It's my firm belief that code is (mostly) written once and read many times so optimising for readability is paramount. We also strive for ergonomics in expressing our ideas, and PRQL is doing great on that front, but it shouldn't come at the expense of future readability.

psychoslave commented 2 years ago

Sorry, but I don't really understand how would arm/leg and anterior/posterior be connected to relational joins?

In a nutshell, because of reasons. 😀

As for right/left, through proper analogy. And although it makes sense for some people to say that monads are like burritos 🌯, we all know that analogies have their stretch limits. On the other hand, stating that a monad is just a monoid in the category of endofunctors is a more academic and formal way to expose the concept, but won’t necessarily help the average layman to grab what it’s all about.

A relational join is a logical concept, and from that strict domain, spatial position are somehow alien notions. Of course, from a more general culture point of view, we have many clues to make sense of a left/right analogies. Not only one might have the Venn diagram in mind, but we even have the fact that in a left-to-right writing system such as English, the first uttered term is on the left, Not only that, but the [anaphoric](https://en.wikipedia.org/wiki/Anaphora_(linguistics) references are generally following this same scriptural relative positions. By the way, on that regard, former/later are also pertaining terms in English. So that makes a lot of reasons to adopt the left/right analogy, despite the independence of the intended concepts of abstract relationship.

Similarly, anterior is related to "something that come first", and posterior "something that comes afterward". Furthermore, arms are related to anterior limbs, and legs are related to posterior limbs.

So, that answer the question of, how all that might be connected. 🔗

Note however that I didn’t expose all that to justify that "really these alternative terminologies should definitely be provided out of the box in PRQL". I actually like the novelty budget metaphor, and agree that limiting shiny apparatuses is a sound guideline.

... where map_tie is currently unexisting (and unsupported type of) function

I’m not sure I fully understand what facility it is supposed to provide, would it allows replacement in a fashion as flexible as the one that alias - g in the ZSH examples?

So that’s it for the thoughts that your answer inspired me @aljazerzen, thanks you for your insights.

psychoslave commented 2 years ago

One proposal for the terminology around joins that someone suggested on HN was to use optional, so you would have optional:left or optional:right.

I like the idea. Wouldn’t something like lenient be more precise than optional though? Merriam-Webster defines lenient as of mild and tolerant disposition or effect : not harsh, severe, or strict. A terser synonymous would be lax.

Similarly, are the terms left/right the best dichotomy here?

Then what about former/later instead? On a morphological perspective, the -er suffix also nicely align with inner and outer if this should either be considered.

snth commented 2 years ago

Interesting point and I agree with you that lenient would be a more precise term. optional is more common in programming languages in general but it bothered me since it is not that one side of the join is optional, but rather where a match can't be found then the tuples on one side or the other are kept rather than dropped. lenient would express that better.

Or how about keep:left and keep:right? Ideal would be when_unmatched:keep_left and when_unmatched:keep_right but that seems too verbose. The default is keep:none and a full outer join would be keep:both.

Then what about former/later instead?

That's not bad actually. I wonder if you meant latter rather than later though? latter is usually the term used in opposition to former while later would be used in opposition to earlier.

psychoslave commented 2 years ago

Or how about keep:left and keep:right? Ideal would be when_unmatched:keep_left and when_unmatched:keep_right but that seems too verbose. The default is keep:none and a full outer join would be keep:both.

Maybe carry, which is synonymous with keep in general, and holds among many other definitions To convey by extension or continuance; to extend, To contain; to comprise; have a particular aspect; to show or exhibit, plus it’s a traditional arithmetic term which is nice for a relational calculus application.

That's not bad actually. I wonder if you meant latter rather than later though? latter is usually the term used in opposition to former while later would be used in opposition to earlier.

You’re totally right, and I just had never noticed this doublet. 😊 Thanks for pointing my attention on this nuance. 🙏

aljazerzen commented 2 years ago

Oh, I've always understood left and right as the left-side-of-the-new-table and the right-side-of-the-new-table, and not the sides of the Venn diagram. I does still depend on writing direction, as old columns are on the left and we add new columns on the right. But I would argue that since PRQL uses engligh verbs written LTR, we can also borrow the mental model of adding columns from left to right.

psychoslave commented 2 years ago

So, what should we do for the continuation of this issue? Do we have some consensus on what might worth an implementation, should we simply wait longer to collect more feedback, or maybe provide more guidance with something like a survey? I see there is a poll application for Github if the latter seems the way to go.

max-sixty commented 2 years ago

I appreciate the spirited discussion.

On reflection, without an alternative to left / right that has a prior track-record, it's unlikely we make a leap on this specific topic now, given our limited novelty budget and the centrality and familiarity of joins to many queries. That doesn't close off any options permanently though.

I hope that balances openness & focus appropriately.