Closed psychoslave closed 2 years ago
Currently our Joins are quite basic — we don't really innovate from SQL. We're very open to doing more here.
Is this specific proposal to change the name of the terms? Or something broader?
While I don't think left
/ right
are ideal, they are somewhat easy to remember once someone has worked them out, and they're very familiar. Do you think these are familiar and easy to remember for most people?
We had thought about allow_nulls:right
for a left join, as an example, but decided they were too unfamiliar relative to their benefit.
Is this specific proposal to change the name of the terms? Or something broader?
Well, change these terms would be more radical than what I had in mind: I had only in mind to add synonymous, so one might still use side:left/right
, but at user own preference, other options would be great.
Since you talk about "something broader", an alternative way to deal with this kind of topic more generally would be to allow users to define aliases, including for hardcoded terms – I guess there are reserved words in PRQL like in most programming languages.
Do you think these are familiar and easy to remember for most people?
Yes, if by most people you mean among people already familiar with English and aware of what a Venn diagram is or otherwise proficient in SQL. An other advantage of a full-fledged alias system would be to allow flexible full internationalization of code base.
We had thought about
allow_nulls:right
for a left join, as an example, but decided they were too unfamiliar relative to their benefit.
I think you don't need to limit the language to a single closed terminology.
Having multiple aliases has a downside that queries become less transferable.
Without having a strong POV on which terms are a priori better — PRQL has a limited "novelty budget", so it's probably not the best place to introduce these terms to query languages, given the likely level of familiarity with the average user. If we could show their success in another language, even a more academic one than PRQL, that would be more compelling. Or if we can support the claim that the terms are familiar in another way.
That said, let's keep an open mind — there's lots we can do around joins — and I do appreciate the mathematics roots of joins, even if we don't end up using these specific terms.
Thank you @psychoslave
Having multiple aliases has a downside that queries become less transferable.
I can imagine several very different things that you might mean with "transferable", would you kind enough to precise me what you had in mind please?
I agree that having multiple aliases can also come with its downside. Actually any state of affair will have its downside. 😀
Now, taking further the idea of providing user facilities to alias things as they wish, there is no obligation to provide any battery at all with the toy. And even providing them, that can be as modules that are optionally loadable, simply setting aliases.
That said, let's keep an open mind — there's lots we can do around joins — and I do appreciate the mathematics roots of joins, even if we don't end up using these specific terms.
On that side, some terminology from the Cartesian/relational product would serve you better, hmm maybe multiplicand/multiplier?.
Also, regarding a thorough alias
facility, this would not be a total novelty. For example, in ZSH, one can easily alias almost anything:
alias -g wherefrom="|"
alias -g eft="&&"
echo love wherefrom read truth eft echo $truth
Maybe this can address the concerns regarding query transferability: it’s all in the hand of the end user to use features wisely. 🧙♀️
Sorry, but I don't really understand how would arm/leg
and anterior/posterior
be connected to relational joins?
Effectively, this argument is specifying what to do with rows that don't match any row from the other table.
How I see it, join does a cartesian product of the two tables. Then it throws out all pairs that don't match the join condition. If all pairs of some row are rejected, it may produce a pair with one side null, but only if this argument we talk about allows it.
side | nulls_left | nulls_right |
---|---|---|
inner | false | false |
left | false | true |
right | true | false |
outer | true | true |
When thinking about it this way, it made more sense to me to have the nulls_left
and nulls_right
compared to side
. But it's quite verbose and it would be hard to shorten. One alternative would be allow_nulls:neither/right/left/both
as @max-sixty wrote above.
Regarding the aliases I unfortunately agree with @max-sixty that this would be an unnecessary spending of the "nobility" budget. If one would really want to have different naming, they could (in the future) define:
func my_join tie:arm condition -> join (map_tie tie) condition
... where map_tie
is currently unexisting (and unsupported type of) function
One proposal for the terminology around joins that someone suggested on HN was to use optional
, so you would have optional:left
or optional:right
. These would be analogous to allow_nulls:left
and allow_nulls:right
. I think that makes sense but one problem that I see with it is that the left/right
terms are used opposite to normal SQL. Maybe that doesn't matter, and one could argue perhaps this way would be clearer?
Similarly, are the terms left/right
the best dichotomy here? In traditional SQL it kinda makes sense (say you write it out in a single line then the expressions appear on the LEFT and RIGHT) but we have pipelines as our organising principle and have laid these out in a top to bottom fashion by giving special meaning to \n
so perhaps we should consider including that? top/bottom
don't quite sit right with me and I'm not sure how you would naturally refer to the previous and current steps in a pipeline? source/target
? I feel like we still need to ponder more on this and maybe something will come.
Regarding the aliases, I agree that this would massively reduce the transferability of PRQL, by which I mean the ability of someone familiar with PRQL to read other PRQL queries without first familiarising themselves in a custom terminology. It's my firm belief that code is (mostly) written once and read many times so optimising for readability is paramount. We also strive for ergonomics in expressing our ideas, and PRQL is doing great on that front, but it shouldn't come at the expense of future readability.
Sorry, but I don't really understand how would
arm/leg
andanterior/posterior
be connected to relational joins?
In a nutshell, because of reasons. 😀
As for right/left
, through proper analogy. And although it makes sense for some people to say that monads are like burritos 🌯, we all know that analogies have their stretch limits. On the other hand, stating that a monad is just a monoid in the category of endofunctors is a more academic and formal way to expose the concept, but won’t necessarily help the average layman to grab what it’s all about.
A relational join is a logical concept, and from that strict domain, spatial position are somehow alien notions. Of course, from a more general culture point of view, we have many clues to make sense of a left/right analogies. Not only one might have the Venn diagram in mind, but we even have the fact that in a left-to-right writing system such as English, the first uttered term is on the left, Not only that, but the [anaphoric](https://en.wikipedia.org/wiki/Anaphora_(linguistics) references are generally following this same scriptural relative positions. By the way, on that regard, former/later
are also pertaining terms in English. So that makes a lot of reasons to adopt the left/right analogy, despite the independence of the intended concepts of abstract relationship.
Similarly, anterior is related to "something that come first", and posterior "something that comes afterward". Furthermore, arms are related to anterior limbs, and legs are related to posterior limbs.
So, that answer the question of, how all that might be connected. 🔗
Note however that I didn’t expose all that to justify that "really these alternative terminologies should definitely be provided out of the box in PRQL". I actually like the novelty budget metaphor, and agree that limiting shiny apparatuses is a sound guideline.
... where
map_tie
is currently unexisting (and unsupported type of) function
I’m not sure I fully understand what facility it is supposed to provide, would it allows replacement in a fashion as flexible as the one that alias - g
in the ZSH examples?
So that’s it for the thoughts that your answer inspired me @aljazerzen, thanks you for your insights.
One proposal for the terminology around joins that someone suggested on HN was to use
optional
, so you would haveoptional:left
oroptional:right
.
I like the idea. Wouldn’t something like lenient
be more precise than optional
though? Merriam-Webster defines lenient as of mild and tolerant disposition or effect : not harsh, severe, or strict
. A terser synonymous would be lax
.
Similarly, are the terms
left/right
the best dichotomy here?
Then what about former/later
instead? On a morphological perspective, the -er
suffix also nicely align with inner
and outer
if this should either be considered.
Interesting point and I agree with you that lenient
would be a more precise term. optional
is more common in programming languages in general but it bothered me since it is not that one side of the join is optional, but rather where a match can't be found then the tuples on one side or the other are kept rather than dropped. lenient
would express that better.
Or how about keep:left
and keep:right
? Ideal would be when_unmatched:keep_left
and when_unmatched:keep_right
but that seems too verbose. The default is keep:none
and a full outer join would be keep:both
.
Then what about
former/later
instead?
That's not bad actually. I wonder if you meant latter
rather than later
though? latter
is usually the term used in opposition to former
while later
would be used in opposition to earlier
.
Or how about
keep:left
andkeep:right
? Ideal would bewhen_unmatched:keep_left
andwhen_unmatched:keep_right
but that seems too verbose. The default iskeep:none
and a full outer join would bekeep:both
.
Maybe carry
, which is synonymous with keep in general, and holds among many other definitions To convey by extension or continuance; to extend, To contain; to comprise; have a particular aspect; to show or exhibit, plus it’s a traditional arithmetic term which is nice for a relational calculus application.
That's not bad actually. I wonder if you meant
latter
rather thanlater
though?latter
is usually the term used in opposition toformer
whilelater
would be used in opposition toearlier
.
You’re totally right, and I just had never noticed this doublet. 😊 Thanks for pointing my attention on this nuance. 🙏
Oh, I've always understood left and right as the left-side-of-the-new-table and the right-side-of-the-new-table, and not the sides of the Venn diagram. I does still depend on writing direction, as old columns are on the left and we add new columns on the right. But I would argue that since PRQL uses engligh verbs written LTR, we can also borrow the mental model of adding columns from left to right.
So, what should we do for the continuation of this issue? Do we have some consensus on what might worth an implementation, should we simply wait longer to collect more feedback, or maybe provide more guidance with something like a survey? I see there is a poll application for Github if the latter seems the way to go.
I appreciate the spirited discussion.
On reflection, without an alternative to left
/ right
that has a prior track-record, it's unlikely we make a leap on this specific topic now, given our limited novelty budget and the centrality and familiarity of joins to many queries. That doesn't close off any options permanently though.
I hope that balances openness & focus appropriately.
Following a comment on HN, I seize the opportunity to suggest more formally to allow alternative to the spatial analogy that lead to side:left/right.
Indeed, this spatial vocabulary certainly is inspired by the geometric representation of Venn diagrams.
But this is not the sole perspective under which one might consider these relationships. Grammar and logic especially have also nice terms to designate these kind of syntactic rapports. Here are some example of alternative triplet inspired by these domains:
As you can notice, most of these proposal come with qualifiers varying only by the prefix plugged on their common stem. It’s possible to extend these morphological series to take into account additional cases, for example inter-, pan- or omni- could be used to produce alternative terms to
full join
.Coming from yet an other perspective, motivated by terseness of used terms, it’s also possible to use combinations such as
tie: arm/leg
. Indeed fromrelationship
you easily come totie
, and thenarm/leg
foranterior/posterior
seems pretty straight forward and analogous to "antecedent/(consequent|postcedent|succedent)".Related resources: