Mixing of weak typing and duck-typing is still prevalent for certain operators

Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate

Artistic License 2.0

70 stars 16 forks source link

Mixing of weak typing and duck-typing is still prevalent for certain operators #391

Closed 2colours closed 1 year ago

2colours commented 1 year ago

Duck-typing (a la Python) works by the premise that the user knows the type of the operands and no coercion happens. Weak typing (a la Perl and Raku) works by the premise that the user knows how the operation treats its operands and no overload happens. Mixing the two principles makes code fragile, yet still there are operators in the core (mainly the numeric operators) that mix the two approaches.

A question on IRC highlighted that an operation like (0..10) + 1 can return 1..11. The problem with this is that it contradicts the concept of "numeric addition" - neither the way the operands are treated, nor the return value fits that picture. Actually, it's really easy to bust it: (0..10) + (1..1) is suddenly 12, even though the intuition is pretty similar to (0..10) + 1. It's hard to reason about this kind of code.

Taking a look at the docs, there aren't excessively many (documented) "hybrids". Range has the 4 basic arithmetic operations (not ** though, interestingly...), Date and DateTime have + and - overloaded, and that's about it for the immediately apparent operations. I can't see a strong reason why it has to be like this, and it actually makes code harder to reason about in any nontrivial situation, because of the conflict of the two typing paradigms.

2colours commented 1 year ago

I'd like to point out that there are two possible resolutions: the choice is on, one can go for either duck-typing or weak typing, the historically prevalent choice being weak typing so it's probably easier (and more "according to the plans") to resolve it in favor of eliminating overloads.

In any case, if "generic arithmetic operators" that look like numeric operators but really can do whatever, are deemed useful, it's also a choice to eventually introduce add, sub, mul or whatever, generic operators (actually, this was the original intention for div as well).

Please keep in mind that the focus of this issue isn't to competitively compare the overload-driven and the coercion-driven approach, simply to resolve an inconsistency that has the worst of both worlds. Should such a discussion take place, I think it would be worth thinking about making the overload-driven approach the primary way but at the end of the day, many languages can even live without operator overloading.

codesections commented 1 year ago

My initial reaction is to agree that we should consistently apply the rule "operators coerce their operands" and that (0..10) + 1 should evaluate to 12 If you want to add one to each element, there's always (0..10) «+» 1.

By second reaction was to check Roast, and see that the current behavior is specced and was added in a commit by @TimToady in 2015. That makes me suspect that I'm missing some consideration that cuts against consistency here.

I'm not positive what that consideration is, but I suspect it has something to do with laziness. I note that (0..10) «+» 1 evaluates to (1 2 3 4 5 6 7 8 10 11); whereas (0..10) + 1 evaluates to (1..11) (that is, the latter preserves laziness). But I'm unsure of the exact justification.

Imo, we should figure out what that justification is and either document it or, if it's not worth the complexity-cost of being inconsistent, remove the special behavior of ranges.

2colours commented 1 year ago

This may sound arrogant but I think with almost 8 years in production, our judgement is almost guaranteed to be better, and it's much rather the exception that should justify itself than we should be trying to chase it down. If it's so useful, Range can already produce the bounds, perhaps a two-element List should have a convenient way to recreate into a Range - but I don't think it's so useful to begin with so it's hard to argue for that.

Either way, this is definitely a breaking change and a design decision to be made, that's why I opened an issue for it here.

vrurg commented 1 year ago

I think the point here is that Range is not a countable entity. Ranges are often used to define... well... ranges. How do you apply an arithmetic to, say, 0.5..10? What'd be that step to count the number of elements?

What is countable are sequences. 1...* + 1 errors out. 0.5...10 tries to guess the step and comes up with a coutable sequence; etc.

A good point is about missing ** operator for ranges.

duncand commented 1 year ago

If we characterize a Range as a set of values, then in the general case it is not countable and trying to count it could result in a special infinity value such that simple scalar math operators then just behave as if given infinity as an input. But in special cases where we know the set is countable, say because it has 2 defined endpoints that are integers, then we can instead produce a regular integer as a count. So then the example to follow would be when we take an array/set/bag/etc and try to treat that as an integer.

2colours commented 1 year ago

I think the point here is that Range is not a countable entity. Ranges are often used to define... well... ranges. How do you apply an arithmetic to, say, 0.5..10? What'd be that step to count the number of elements?

I don't think this is related to the issue. It's not mandatory to have + available for all data types to begin with. By the way, as I said, Range + Range already works on the numeric path so Raku can answer your question. (0.5..10) + (0.5..10) is 20, from which we can even conclude that 0.5..10 has a numeric value of 10. Side note: this is even inconsistent with (0.5..10) * 2.

This whole mixing causes a kind of "diamond" problem. The biggest advantage of "weak typing" is that you can reason about the outcome of the operation regardless of the input types. Now, what would happen if you had a Range in an addition? There are at least two options, and then we didn't even talk about mixing a Range with, say, a DateTime. It will fall back to numeric addition with a finite set of exceptions that have nothing to do whether the input type is a number, or can be coerced to one.

It's rather just an illustration of how fragile this system is that ** is immediately an "outcast operator" not overloaded for Range, it feels like "making a point" to add that as well, not something that is actually a good idea.

vrurg commented 1 year ago

So far, what I see is a case where in many areas operators need to take special care of Range. Because I've no idea why (0.5..10) + (0.5..10) is 20? Why not 40, which would be closer to what ... thinks of it.

Then, again, what about 0.5 ^.. 10 and alikes? How do we count these in terms of integer arithmetic? The infinity singularity, proposed by @duncand would result in losing information with no useful outcome.

This entire issue looks more like a technical issue about treating bugs/incomplete implementations when it comes down to Range.

librasteve commented 1 year ago

afaict the current spec does this:

return a new Range by applying the operation to the end points

edit: this is wrong... [I was mixing Range and Sequence] <<<<<<<<<<<<<<<<<<<<<<<< since Range is an linear series:

all of +-*/ work fine
breaks (this explains why no ) <<<<<<<<<<<<<<<<<<<<<<<< replace by...

since Range endpoints are linear (ie Real):
all of +-*/ work fine (ie. in the case of check a number is in a certain range)
* (and other powers) could also work in the sense that (Range Range) could work [1]

the benefits are:

preserves Range laziness
let’s you adjust both endpoints together in a simple way
dwim (at least for me)

so I see no benefit in changing (although this feature may need to be learned)

note[1] in general I think that the useful realm of powers for calculations with real physical measurements is related to the powers present in the laws of physics ... so J=σT⁴ is about the maximum power and (so 2,3,4,<1/2>,<1/3>,<1/4>, and -ve equivalents)

2colours commented 1 year ago

So far, what I see is a case where in many areas operators need to take special care of Range. Because I've no idea why (0.5..10) + (0.5..10) is 20? Why not 40, which would be closer to what ... thinks of it.

I think this in itself shows why these overloads are a bad idea. My brain melted while I was trying to figure out, how you ended up with 40 when we took an interval that had a "length" of 9.5 twice. I suppose you added the minimum values and the maximum values respectively and maybe counted the integers inclusively?

In any case, what actually happens is that the Ranges turn into numbers like this and we just get 10 + 10, which I think is arguable on one hand (why do we assume integer values for something that should be the size of a Range) but on the other hand makes a little more sense than invent completely new math (Ranges as 2-tuples).

Then, again, what about 0.5 ^.. 10 and alikes? How do we count these in terms of integer arithmetic? The infinity singularity, proposed by @duncand would result in losing information with no useful outcome.

There is nothing to propose here, I just sent the code that already does this. The result is 9, by the way. Again, I do think that 9.5 for both would make more mathematical sense, that being the size of the Range.

However, there is something that we both missed in this discourse: Ranges are NOT dedicated for numbers. And this is where I think another massive hole shows up in the current system:

there is an operation for Ranges that isn't meaningful for half of Ranges and therefore throws a random internal exception at runtime... was it really worth it?
Ranges don't try to be 2-tuples with strings apparently since there is no Range ~ String or Range ~ Range overload.

since Range is an linear series:

What does it mean? Ranges don't contain any information about the "internal" values. And anyway, if you wanted to handle Range * Range or Range / Range cases, the linearity would fall apart.

Actually, having both operands as Ranges opens up the can of worms (what about the inclusion of endpoints? which one will win?) - so we are destined to just silently fall back to the numbers; the behavior @vrurg didn't like already.

It's a bit hard when I don't see any reaction (apart from @codesections ) to the basic point: namely that it defeats the purpose of weak typing if you start adding ad-hoc overloads, making the code both harder to reason about and kind of unpredictably still fall back to the base behavior which is, as you can see with Range + Range, completely unrelated to the overload, even semantically.

I think there would be several options for compromises (including generic arithmetic operators or introducing a method for Ranges) - I would even say mutually beneficial changes, even, as they wouldn't sacrifice anything you pointed out as valuable for you, they would merely be a breaking change. Or alternatively, I can add this to the "not gonna fix" issues, without ever understanding where I failed to illustrate the conceptual problem that actually goes against the basic design of the language. I don't have many other options.

vrurg commented 1 year ago

I think this in itself shows why these overloads are a bad idea. My brain melted while I was trying to figure out, how you ended up with 40 when we took an interval that had a "length" of 9.5 twice. I suppose you added the minimum values and the maximum values respectively and maybe counted the integers inclusively?

Now this demonstrates why is there such a big misunderstanding. 40 was a mistake of mine, but it was based one of my experiments with sequences where it did make it with 0.5 step. So, ignore the number. But the case is interesting from the point of view of different expectations. Mine is for the step to be 0.5 with 20 elements: 0.5, 1, 1.5, etc. Your is 9.5, and in a way it is interesting approach. But when it comes down to the fact that Range is an Iterable, where .elems is the fallback for arithmetic ops – how do we get the 9.5, except by having another operator overload?

And all this not to mention one's expectations about how ("aa".."fb").List must do its job.

but on the other hand makes a little more sense than invent completely new math (Ranges as 2-tuples).

How about complex numbers math? Which is also about tuples. ;)

However, there is something that we both missed in this discourse: Ranges are NOT dedicated for numbers.

And that's why you get the same for "a".."d" + 1 as for "a" + 1. No problem here.

Here is my position in brief:

There is no way for Range is changing its current behavior. It can be clarified to take care of edge cases or even some big gaps in the implementation. But otherwise I don't see sufficient reasoning behind such a big shift in semantics.
Clarification must take place. The only question is who would take responsibility of it.
Perhaps the clarification must take into account (the rules of interval math)[https://en.wikipedia.org/wiki/Interval_arithmetic]

usev6 commented 1 year ago

I tried to find some more context to the commit @codesections pointed out, which specced the current behavior: https://github.com/Raku/roast/commit/139c10c.

There was a corresponding commit to Rakudo just a minute before: https://github.com/rakudo/rakudo/commit/59b5a4b

Earlier that day there was a discussion how a Range could be shifted, starting here: http://irclogs.raku.org/perl6/2015-09-28.html#15:27. That discussion ended with Larry saying

the 1..* is not expensive to generate, but unrolling the range can be we oughta be able to add and multiply intervals directly though...

There was another discussion after the commits: http://irclogs.raku.org/perl6/2015-09-28.html#21:48

Maybe also of interest could be this part of the old design docs: https://github.com/Raku/old-design-docs/blob/63e44c3635/S03-operators.pod?plain=1#L3632-L3640

[This section is conjectural, and may be ignored for 6.0.]

Since use of Range objects in item context is usually non-sensical, a Range object used as an operand for scalar operators will generally attempt to distribute the operator to its endpoints and return another suitably modified Range instead, much like a junction of two items, only with proper interval semantics. (Notable exceptions to this autothreading include infix:<~~>, which does smart matching, and prefix:<+> which returns the length of the range.) [...]

[I'm aware, that I don't react directly to the primary point of this issue (mixing of weak typing and duck-typing). But @2colours pointed out that Range is one of the hybrids, that sticks out. So I thought that adding this context could be useful.]

librasteve commented 1 year ago

To expand a little on my earlier note, the current behaviour is good for things like this…

my $length = 5;
my $offset = 13;

my @a = 'a'..*;
say @a[(0 .. $length) + $offset];    #(n o p q r s)

my $fraction = <1/4>;

my @b = 0..^32;
say @b[(0 .. *-1) * $fraction + $offset];  #(13 14 15 16 17 18 19 20)

2colours commented 1 year ago

But when it comes down to the fact that Range is an Iterable, where .elems is the fallback for arithmetic ops – how do we get the 9.5, except by having another operator overload?

I wouldn't expect something that is called "a range" to be an Iterable to begin with. But I don't think we have to go there - in any case, there is discrepancy, and that's somewhere along the lines the point.

And that's why you get the same for "a".."d" + 1 as for "a" + 1. No problem here.

Yes problem here, because "a" really could not be coerced to a number, however "a".."d" actually could, hence the inconsistency. You suddenly have a type that breaks the expectation that you can add values that have a Numeric coercion, and you have it in the core.

How about complex numbers math? Which is also about tuples. ;)

They are not just irrelevant to Ranges but moreover a complex is so actively unlike a "range" that making an analogy with a complex number and a "range" should be alarming.

But otherwise I don't see sufficient reasoning behind such a big shift in semantics.

That's quite sad when I'm giving multiple examples in every comment.

3. Perhaps the clarification must take into account (the rules of interval math)[https://en.wikipedia.org/wiki/Interval_arithmetic]

Interval arithmetics won't work with Ranges that are Iterable. Now you are reassuring my interpretation of size...

Edit: just two days ago, you said Ranges weren't countable, and now it emerged as a central point that Ranges are Iterable... how would they have not been "countable"? There is a one-to-one correspondence between the concept of "iterability" and the concept of "countability" in the mathematic sense. These are the things why I feel the conclusion drives the arguments, and not the other way around...

2colours commented 1 year ago

I wanted to mention in a separate comment that the current behavior, that is mostly like 2d vectors, not like intervals, also leads to absurdity if you multiply by a negative number: (1..10) * -1 is -1..-10 which is a Range that coerces to True but at the same time returns .elems == 0.

Perhaps if you want vector behavior, the actually clean solution would be to add a vector type.

librasteve commented 1 year ago

To address the question of should raku always do duck typing or always weak typing. I think that the ethos of raku is to be pragmatic. That is it is the antithesis to a formal language with tight rules and purity. Since no one would argue that now.Date + 1; should not be tomorrow (?) you call for the language to conform exclusively to one variant is already untenable. My guess is that many of these choices were made to help raku be a smooth transition from perl.

Back to Range … well it seems to me that these are special in a parallel to the specialness of Date and Datetime. Range is used as a building block for array indexing and I can see that it would be handy for other natural objects such as part of a plot (x-axis, y-axis). So imho the current design is helpful as an easy way to factor and offset a Range. The fallback to regular Numeric coercion aka ‘+’ also makes sense (but you have to know that 0.5 .. 10 has 10 elements :-) ). Finally I can see that Range + Range or Range * Range quickly get into the realm of wtf. Fortunately raku lets you do your own overloads if your application rally needs something esoteric like this and I for one am glad that the language does not try to nail these down esp. with all the endpoint corner cases…

2colours commented 1 year ago

I think that the ethos of raku is to be pragmatic.

Is the suggestion that there could be "generic arithmetic operators", the way there are generic comparison operators for example, unpragmatic? If there is a language suited for a modification like that, it's Raku. What would be the downside to keep custom behavior to such generic operators that don't coerce and don't fall back to a default? What am I missing? I'm getting the impression that "tight rules and purity" are demonized, like something that is definitely going to hurt. From what I can see, it's almost for free.

I'm not sure if I have said it but the problems with these custom overloads on top of coercions start to amplify when you reach the "diamond problem", with Date + DateTime or Date + Range (besides Range + Range that has been mentioned a lot of times by now). What weak typing offered you was that you could reason about $a + $b for any arguments. For starters, you could know that the result is a number. You could throw in anything as long as it could be coerced into a number. Now, suddenly you can't reason about $a + $b or even $surely-a-range + $b or $surely-a-date + $b. Why is it pragmatic that a coercive numeric addition shares an interface with something that gives a Date or a Range? If it was $a add $b with an operator that is promoted to be generic, those who specifically want to wrangle a Date or a Range could still do it very simply (I actually think even this interface is overly generic, especially for the Range addition - but let's put that aside), without taking anything away from the readability of $a + $b in general.

Edit: before somebody says "but the user could still overload + and mess the assumption up" - it's different. Somebody could redefine the grammar of Raku or whatever symbol coming from the core; nobody would say that's a problem with the language. That's actually as DIHWIDT as it gets, somebody went out of their way to do damage, despite being told not to. However, if the language itself does it, that means there is no concept to begin with, only chaos.

Edit2: sorry, somehow I didn't see my own comment, got confused, and basically summarized it again. Deleted now.

librasteve commented 1 year ago

Finally I had the chance to ingest the wikipedia link and also to understand the intention of Range to check if a (Real) number lies within a, err, Range. I felt inspired to write this post raku: Home on the Range

Edit: ^^ apologies I have now fixed the link

lizmat commented 1 year ago

@librasteve the link is bad

gfldex commented 1 year ago

@librasteve Perl stands for "Practical Extraction and Reporting Language". Granted, "pragmatic" is much closer "practical" then "pure".

For me &infix:<+>(Range, Numeric) is a shortcut that saves us a method like Range.shift-by-numeric. I believe to remember a discussion about Ranges on IRC many years ago that ended up with agreement about the importance of lazyness for Ranges. When Ranges get large (think astronomy), coercing them to finite lists becomes impractical. I believe there is a trap here. Please consider the following code.

    multi sub infix:<±>(Numeric \n, Numeric \variance --> Range) {
        (n - variance) .. (n + variance)
    }

    say 2.6 > 2 ± 0.5; # True

Neat, right? But that only works by happy accident. &infix:«>» defaults to coercion to Real and got a few special cases for other numeric types. It ignores Range. To me that feels incomplete and a little bit trappy.

So yes, Range needs more thought but I don't consider that (1..2) + 1 is 2..3 to be wrong.

2colours commented 1 year ago

Granted, "pragmatic" is much closer "practical" then "pure".

There is a false dichotomy here. If by "pure" we mean "not setting up traps for the users" then being "pure" is very much a pre-requisite of "pragmatic". Anyway, there is no apparent contradiction, and I'm not sure where this narrative shows up and why.

is a shortcut that saves us a method like Range.shift-by-numeric

why is this a "save"? I would have suggested a name like offset but anyway, there is an operation that only makes sense in the context of Ranges and a method wouldn't pollute the global namespace and a common interface like the so-called numeric addition. Why was it worth it?

ended up with agreement about the importance of lazyness for Ranges

This seems unrelated; the question is the interface.

Please, to any of you, the question remains: what is your problem with either a dedicated operation, or creating the appropriate generic, non-coercive operators - the problem that trumps the current unpredictability of these semi-overloadings?

vrurg commented 1 year ago

I wouldn't expect something that is called "a range" to be an Iterable to begin with. But I don't think we have to go there - in any case, there is discrepancy, and that's somewhere along the lines the point.

One of the few things I could agree to. Perhaps it would make more sense for typed ranges where, say, integer and string ones are iterable whereas rationals and floating point are not. But going this way may bring us to the complex areas of custom user types where we wouldn't have an easy way to guess iterability/countability of a type.

Otherwise I consider the Wikipedia page one of the best answers to most of the questions here. The problem is not the overloaded operators doing something wrong; it is in too limited support for interval mathematics when it comes down to Range.

librasteve commented 1 year ago

since Range is an linear series: What does it mean? Ranges don't contain any information about the "internal" values. And anyway, if you wanted to handle Range * Range or Range / Range cases, the linearity would fall apart

I stand corrected on this point and have edited my post accordingly.

say 2.6 > 2 ± 0.5; # True is a happy accident

well yes, but the documentation is clear that '~~' is the operator to check for inclusion I would be happy to see these operator overrides (plus Range op Range) in a module with the idea that it could graduate to core in (?) 6.f

2colours commented 1 year ago

The problem is not the overloaded operators doing something wrong

It's not about the behavior, it's about the interface. The overloading itself is wrong. If Raku had used arithmetic operators the way Python does - no fallback, no coercions, only overloads - I wouldn't bring it up at all.

I think there are contradictions between the proposals as well, like here with Iterable versus the concept of an interval, that's why I'm trying to "aim for the moon" for a long time: what do you all have against generic, "no fallback, no coercion, only overload" arithmetic operators? The behavior could be preserved and the advantages of dedicated numeric operators wouldn't be thrown out of the window.

librasteve commented 1 year ago

Proposal

I put this strawman forward for comment...

We split the use cases of today's Range type as follows:

class Range $x .. $y where $x & $y ~~ Int|Str (anything else?) to generate lists of consecutive numbers or strings to act as a matcher to check if a integer or string is within a certain range
- endpoints are Int with/out cats ears
- does Positional, does Iterable
- arithmetic +-*/ operators with scalars are distributed to endpoints like Junction with 2 elems, then each endpoint coerced to .Int
- prefix '+' special cased to .elems
- operator '~~' special cased to 'is contained by'
- coercer like .Num returns an Interval
- .Interval coerces endpoints to .Rat returns an Interval
class Interval $x .. $y where $ & $y ~~ Real and $x | $y !~~ Int to act as a matcher to check if a Numeric is within a certain range
- endpoints are Real, no cats ears
- endpoints may be or become Int, the constraint that either is Real is applied at construction
- not Iterable nor Positional
- arithmetic +-*/ operators with scalars are distributed to endpoints like Junction with 2 elems
- Range op Range arithmetic +-*/ operators implemented
- Range ** N operator implemented
- prefix '+' special cased to .elems
- operator '~~' special cased to 'is contained by'
- numeric cmp (<, <=, ==, !=, >, >=) work for Scalar op Range, Range op Scalar, Range op Range, when Range op Range, we check both endpoints to detect overlap in legal values so 1.0..2.0 < 2.0..3.0 False, 1.0..2.0 <= 2.0..3.0 True
- coercer like .Int or .Str returns a Range
- .Range coerces endpoints to .Int then returns a Range

This is a breaking change so would be done in (?) 6.f. Interval could be exiled to a module in the meantime.

Backward compatibility would only break for Range.new (ie. $x .. $y where $x | $y !~~ Int|Str) which could warn with "class Range is deprecated for non Int|Str endpoints, class Interval will be substituted" (or maybe a full on error would be better?)

Backward compatibility could be facilitated with a new role Rangy (maybe) that, when punned, provides the combination of behaviours that old Range supports. In that case class Range does Rangy and class Interval does Rangy

2colours commented 1 year ago

It's a whole different topic from what I opened the issue for, moreover it doesn't even address it. I'm considering closing this issue now; if you want to proceed with the unrelated topics, please find a place for those discussions.

raiph commented 1 year ago

Mixing of weak typing and duck-typing is still prevalent for certain operators

This comment responds to your first two comments.

My summary:

I think you've gotten confused about overloads vs duck typing vs weak typing, and have misunderstood what's good for learning, intuition, and reasoning for people in general. I explain why I think those things.
I agree with you that something's not right, and I suggest a couple things worth considering about that.

Conceptually, both duck typing and weak typing are about allowing an argument corresponding to a particular parameter of a particular function (or operator) signature to be something other than a particular type, provided the argument is acceptable for other reasons.

I think duck typing in Raku is best understood as a function/operator using an argument corresponding to a parameter by assuming it will behave as needed by the function/operator, perhaps checking if first using Raku's .can method. If an argument can do what is asked of it by a function/operator, then all is well that ends well. The check of the claim an argument .can do this or that may be in a parameter in the signature of the function/operator, or within the body .

Weak typing in Raku means an argument corresponding to some parameter supports a coercion into a target type. This will almost always be checked in the parameter declaration in the signature of the function/operator; the body only gets executed if the signature successfully binds all required parameters to arguments.

I would guess there's relatively little need to combine them for any given single parameter of a single function/operator signature.

That said, if they have been so combined for some parameter, and it introduces a nice DWIM, then I see no reason to stop that unless known corresponding WAT dogs bite, not merely yap.

Likewise, if there are multiple routines (via inheritance or overload resolution) for any given function/operator, I don't currently see a reason to change overloads that introduce a nice DWIM (unless corresponding WAT dogs bite, not merely yap).

Mixing the two principles makes code fragile

You can say so, but there needs to be adequate evidence for such a claim. (And to be clear, I mean a lot more than 1 example, though it seems 1 will have to do as our start.)

A question on IRC highlighted that an operation like (0..10) + 1 can return 1..11.

Having looked it up, I see a classy succinct DWIM. Thank you Larry. :)

The problem with this is that it contradicts the concept of "numeric addition"

I don't agree. It numerically adds 1 to each of the two numbers visible in the range gist, the 0 to make 1 and the 10 to make 11.

neither the way the operands are treated, nor the return value fits that picture.

I disagree.

Actually, it's really easy to bust it: (0..10) + (1..1) is suddenly 12, even though the intuition is pretty similar to (0..10) + 1.

I think your use of words is misleading. Definitely for the general case of "intuition" and "suddenly".

First, intuition is personal, knowledge based, and tends to be approximate, leading to mistakes, more of them, and bigger ones, when we know less, and then fewer and smaller ones as we know more.

Second, results that don't match with intuition, especially if they are pretty dramatic and initially difficult to explain, are ideal initiators of learning and opportunities for effective teaching. A google for "role of surprise in learning" provides rich information from academic brain science papers and articles about this (perhaps surprising!) aspect of learning. Even good old Socrates and Aristotle are said to have said something to the effect that “surprise combined with astonishment is the beginning of knowledge”.

As you'll find out if you read the relevant material, if one enjoys updating one's knowledge, and especially if the learning material is good, and double especially if the new thing learned feels good, then the outcome of the surprise will result in optimal learning, optimally priming later intuition.

And even if it's not enjoyable, or the learning material is not clear, or the new thing learned doesn't seem useful, then assuming the update still happens, and is correct to the degree the learning material is, and still lodges in the brain (eg you're not super tired), then the outcome will still at least improve/correct future performance, improving later intuition.

It's hard to reason about this kind of code.

Imo it's easy provided you base your reasoning on knowledge gleaned from sources such as good documentation, and remaining approximately aware of where you lack knowledge. (I think this is so especially if you are curious and enjoy learning; if you don't it's possible to react negatively to surprises, learn less, and then struggle.)

Taking a look at the docs, there aren't excessively many (documented) "hybrids".

Too few surprises means poor learning efficiency. Too many also does. The sweet spot is in the middle.

There are parts of Raku that imo have far too many surprises, but this + example, and what else is documented (dates and times) strike me as well within (the the low side of) the sweet spot, er, range.

Range has the 4 basic arithmetic operations (not ** though, interestingly...)

What would that operation mean? Did you research applying operations like these (shifting and scaling ranges)? If you did, did you draw any conclusions about scaling exponentially?

I can't see a strong reason why it has to be like this

It doesn't have to be like this.

But the reason is for it to be enjoyable and intuitive for those who know it and want it.

it actually makes code harder to reason about in any nontrivial situation, because of the conflict of the two typing paradigms

Neither $date + 1 nor $range + 1 involves weak typing or duck typing of the date, range, or number. They're just sensible friendly DWIM overloads.

If these overloads do make code harder to reason about -- which I'm not yet agreeing or disagreeing with -- it's nothing to do with weak typing or duck typing.

I am also not convinced these overloads make code harder to reason about overall. Yes, one can't presume that ranges numerify to their length for all numeric operators. And generalizing to code using the plus/minus/multiply/divide ops with an untyped variable on one side and a numeric scalar on the other, you can't assume the result will be a number. But how often have you written code like that and actually wanted ranges to numerify to their length rather than that being an error?

Thus far your narrative suggests to me that:

Beginner doc material should emphasize that it often doesn't make sense to add to, subtract from, multiply, or divide, a range's length, nor, conversely, always numerify to its length, and that this insight led to +/-/*// doing endpoint shifting/scaling instead, but that means beware assuming all Iterables/Positionals will numerify with those ops, because Ranges won't.
It may be desirable to have something like a subset Collection that matches all collections, but not ranges, so a variable can be typed Collection to avoid Ranges matching.

I'd like to point out that there are two possible resolutions: the choice is on, one can go for either duck-typing or weak typing, the historically prevalent choice being weak typing so it's probably easier (and more "according to the plans") to resolve it in favor of eliminating overloads.

I'm confused by your mention of overloads. Overloads are orthogonal to duck typing which is orthogonal to weak typing. I guess you're thinking any particular overload does either coercion or duck typing? It would surprise me if many did both, but it would also surprise me if your recipe is a helpful way to process things. (That said, , surprise is the spice of life...)

In any case, if "generic arithmetic operators" that look like numeric operators but really can do whatever, are deemed useful, it's also a choice to eventually introduce add, sub, mul or whatever, generic operators (actually, this was the original intention for div as well).

Do you mean having alpabetic name aliases to the symbols? What's that got to do with this issue?

Please keep in mind that the focus of this issue isn't to competitively compare the overload-driven and the coercion-driven approach,

What happened to duck typing?

simply to resolve an inconsistency that has the worst of both worlds.

As noted it involves neither duck typing nor weak typing, so if it's the worst of any worlds, they're different worlds.

Should such a discussion take place, I think it would be worth thinking about making the overload-driven approach the primary way but at the end of the day, many languages can even live without operator overloading.

Overloads are everywhere in Raku. They're a great thing and they're here to stay, with operators being an exemplar of their use. So are coercions. And while duck typing has seen very little use, it's also a good thing that it's available, and here to stay, even if I can't think of any operator that uses it.

But anyway, each of these things is orthogonal to the others.

I think I've read some other comments in this thread but I need to stop for now.

It sounds like this issue may be closed. If so, and someone (@2colours or anyone else) decides to open another issue, please do make sure it has taken what has been written here into account. (I haven't read all of it, but it would be really upsetting if a new issue simply ignored what was written in this one, even if it's a painful process to extract salient contributions before writing a new issue.)

librasteve commented 1 year ago

In the light of my strawman proposal (^^^), I did some research off the excellent wikipedia Interval Arithmetic page - this showed me that (i) interval arithmetic is a large and complex area and (ii) that there are standards and libraries that do this deeply and I assume correctly.

I also found that the '..' Range literal "operator" is not overloadable easily within a module (at least until I get a real handle on raku AST) and this makes the premise of hooking in the Range constructor invalid.

So, to absorb my Interval Arithmetic energy and to act as a proving ground for collaboration of how to extend raku in this area and the possible inclusion of this concept into the Physics::Error module (I am the author), I have written a new (rather noddy) Math::Interval module. Currently this does the Range op Range operations (+-*/) that @2colours mentioned and I have some TODOs in there to add in Comparison and Set ops.

So given this and mindful of the priorities on the core team, I would say that this new module supercedes and embodies my strawman. Even long term, I cannot see a real Interval Arithmetic capability in raku core (although Julia has something like this), and would expect a deep implementation that uses standard libraries to live in a raku module. Meantime this is about playing with the APIs to the concpet in a shallow and hopefully convergent way.

Please do feel free to make any contributions you have via a PR... it may take an hour or so to appear on raku.land.

vrurg commented 1 year ago

@librasteve I'd appreciate if you can translate this into a PR for Range in Raku core.

librasteve commented 1 year ago

@vrurg some questions:

does that need NQP skills? (i have none)
do you want the todos for comparison and set operators done first?
is there a “near core” module route (eg underscore)?

i’m asking the latter because we may want to evolve to include more operations over time like log, exp, trig, powers, complex - some of which which can be quite quirky in Interval Arithmetic I'm probably more at ease with just incremental change in module land

vrurg commented 1 year ago

@librasteve

does that need NQP skills?

No.

do you want the todos for comparison and set operators done first?

It can be a problem-solving, explaining the concepts. It would then be closed with PR merge when all is settled down.

is there a “near core” module route (eg underscore)?

Just have a look into src/core.c – that's all you'd need for now. And it is not much different from developing a module.

The only problem I see is that we're about to prohibit extension of older language version with new features. But this case I'd rather consider a bug fix.

2colours commented 1 year ago

First, intuition is personal, knowledge based, and tends to be approximate, leading to mistakes, more of them, and bigger ones, when we know less, and then fewer and smaller ones as we know more.

At the point you said this, you claimed DWIM a couple of times, which is the same thing in a different package. If you rely on subjective feels, you can't discredit an argument by addressing its subjectivity.

I don't agree. It numerically adds 1 to each of the two numbers visible in the range gist, the 0 to make 1 and the 10 to make 11.

I think your use of words is misleading. Numeric addition means adding numbers. Nothing more, nothing less. Ranges are not numbers.

But the reason is for it to be enjoyable and intuitive for those who know it and want it.

This is a harsh claim that would actually require justification.

Neither $date + 1 nor $range + 1 involves weak typing or duck typing of the date, range, or number. They're just sensible friendly DWIM overloads.

This is both vague (what makes something sensible or DWIM?) and not a technical categorization. And yes, they do involve a half-heartedly applied principle of duck typing, it's just harder to say whether it's the Date that de-facto provides the "to-integer addability" interface or the Int provides the "to-date addability" interface. It's a mutual thing. But in any case, by the overloads on the exact types, it's the types that provide this operation, and importantly it's not the other way around: it's not the operation taking care of the appropriate types.

And generalizing to code using the plus/minus/multiply/divide ops with an untyped variable on one side and a numeric scalar on the other, you can't assume the result will be a number. But how often have you written code like that

So you do notice that an argument has been made, even if you are reluctant to consider it relevant. I'm grateful for that anyway.

Honestly, I think it's legitimate to even write code that assumes that adding anything (Junctions not included, and those are quite literally not in Any and more or less transparent to most of code) with the so-called numeric addition will result in a number, and use that as a number. This clearly makes a difference in reasonability if you have to consume values that you don't produce; not just for numeric operators but all operators that are defined to coerce.

Also, I think it's much easier to learn or promote the language if you can count on the same behavior - or principles, at the very least! - applying all the time, rather than requiring a handbook for any type you may encounter in a huge language.

And yes, I see no comparable advantage of the current behavior. Let me ask: how often do you even want to offset a Range? Out of all those situations, how often do you want the operation to resemble good old coercive numeric addition, for a specific interface reason, not for feeling clever?

and actually wanted ranges to numerify to their length rather than that being an error?

Wait, wait. This is an unrelated topic. If the numeric coercion of Ranges is inappropriate (I think it might be), then that should be addressed, as something not sufficiently DWIM, right?

Anyway, as we have come this far, one would ask at the very least: why do Lists add as their number of elements, really? Wouldn't concatenation be much more DWIM and useful overall? Ignoring the consequences, as we are kind of ignoring the consequences with the overloads on Ranges and DateTimes and all this.

Beginner doc material should emphasize that it often doesn't make sense to add to, subtract from, multiply, or divide, a range's length, nor, conversely, always numerify to its length, and that this insight led to +/-/*// doing endpoint shifting/scaling instead, but that means beware assuming all Iterables/Positionals will numerify with those ops, because Ranges won't.

This is where I think the deep lying social/community problems start. If this is what you think is appropriate for beginner doc material, rather than being an issue of teachability itself, I don't think we can have any meaningful conversation about this topic.

I'm confused by your mention of overloads. Overloads are orthogonal to duck typing which is orthogonal to weak typing. I guess you're thinking any particular overload does either coercion or duck typing? It would surprise me if many did both, but it would also surprise me if your recipe is a helpful way to process things. (That said, , surprise is the spice of life...)

I'm not sure about your confusion. It was a very soft and "innocent" mention of overloads, namely that there should be no overloads.

Anyway, I was thinking in coherent architectures which should be sound and complete. It's not that a single overload does this or that. Also, for the lack of better words, I specifically started the issue with explaining the two ways I can identify, and I have never come across a third way. (No, arbitrarily mixing these two is not a third way...)

If you want the types of the operands to govern the behavior, you will achieve that by overloads as far as we are concerned, and you will refrain from coercions because coercions would suddenly mean you can't just look up the interface of your type. (Which, again, could work but worse than refraining from such thing.)

If you want the chosen operator/operation to govern the behavior, you are free to allow coercions (as the behavior is still very straightforward and generic) but you won't do overloads because that would break the principle that you can just look up the operator/operation and reason about it.

Both of these approaches can make sense and be useful but apparently Raku does neither. You need to look up the overloads on the operands (both/all of them, quite possibly), and if you reach a dead end, you have to consider where it may fall back and what coercions might be applicable. This is much more tedious with any given types (I presume this is why you had to propose that complicated chunk of "beginner documentation"), and unmaintainable if you have little presumptions about the data you need to perform operations on. In which case, ironically, your best bet is to write Pythonic code with explicit coercions, hoping to narrow down the candidates to something you are at least aware of.

Do you mean having alpabetic name aliases to the symbols? What's that got to do with this issue?

That the current hybrid operator could be finally restored to really just "numeric operator", as the intention was, while the benefits of having behavior depending on types could be kept for something else, which could also be pure.

What happened to duck typing?

Nothing, it refers to the same phenomenon. (And here I thought it was a plus that I reiterated the same point using different words.)

As noted it involves neither duck typing nor weak typing, so if it's the worst of any worlds, they're different worlds.

As noted it involves both.

librasteve commented 1 year ago

@vrurg - yeah that looks doable - BUT we are going to have to draw the line somewhere

I get the feeling that Larry wanted to show us where to plug in real Interval Arithmetic but then avoided adding a deep implementation a la Julia into core and placing it in the critical path and slowing things down (!)

librasteve commented 1 year ago

Anyway, as we have come this far, one would ask at the very least: why do Lists add as their number of elements, really?

one of the raku superpowers is hypers >> << and operator prefixen, so in Python you may do:

a = [1,2]
b = [3,4]
a + b     #[1, 2, 3, 4]  where I guess '+' is a concat op

in Raku, you can do

my \a = 1,2;
my \b = 3,4;

a + b              #4 
a >>+<< b          #(4 6)
a X+ b             #(4 5 5 6)
|a, |b             #(1 2 3 4)

Thus it makes sense generally to have .Numeric (aka prefix:<+>) return .elems on Iterables as the simplest, widest common result, since there is a smorgasboard of variations for you to reach for if this is not what you want.

This also meets the expectation of perl coders.

This general rule is sometimes overridden eg. Date, Datetime, Range and Junction. These exceptions are documented and need to be learned.

I think that Larry likened perl to English on several occasions, in that sometimes you have to burden the language with nuances, richness and special exceptions in order to get the richest user experience. (As opposed to eg. German)

2colours commented 1 year ago

It still puzzles me if somebody wasn't content with just adding Ranges, or coming up with a way to offset and scale them, why would |a , |b so convincing when array concatenation is a common operation; overwhelmingly more common than "let's add the lengths".

This also meets the expectation of perl coders.

I don't know if Perl coders have expectations other than being ready for a surprise in any moment, it's really Perl versus rest of the languages most of the time. Anyway, as things went down, there is no particular reason to make an appeal to Perl folks. They refused this language on several occasions, and there is less and less of them by the day to win over anyway.

I think that Larry likened perl to English on several occasions, in that sometimes you have to burden the language with nuances, richness and special exceptions in order to get the richest user experience.

The problem with these likenings is that they contain so little substance. What does "the richest experience" mean in English and what does it mean for Raku? In what exact quality are they alike?

If I had to compare English to a programming language, I would think of languages like oldschool Javascript, C, or outright Go. I would have never enjoyed writing Raku code if it was like that. Uninspired and uninspiring, wordy, minimal syntax that doesn't really support abstractions, ambiguity and so on. Matter of fact, I would have never chosen to learn English if it didn't have insane payoff in real life but it's basically unavoidable. Again, kind of like the bad parts of C and Javascript added up...

I was trying to illustrate that the expressiveness doesn't depend on these "clever hacks" a single bit. It makes sense that one would want to overload many operators every now and then and it equally makes sense that one wouldn't want to mix it with coercions to create code that is harder to reason about. In which case it's obvious (and not very difficult) to set up operators that don't coerce and are encouraged to be overloaded. The end result would be equally "rich", "expressive" or whatever. Actually, I'd bet my life if that was how it got added originally (and there clearly weren't many discussions about adding the Range overload for one), nobody would even question it's the right approach.

Yet roughly half of all feedback sounded like some core value is being "under attack". As if just by staying consistent when it costs basically nothing one would give up on the terseness, rich syntax and "batteries included" approach, or anything one can at least argue for. I can't recall any feedback regarding any of the proposed alternatives, and the most approachable, relatable feedback on my concerns was basically like "it doesn't matter all that much in reality". Besides that, I can recall sentiments like "Larry must have had a good reason", "it does what I mean", "the behavior is useful", "weak typing doesn't have the benefits you assign to them anyway" (this one doubles down on the concerns about the design, lol) and the almost expected "the design is great, people just need more education on Raku".

I didn't want to close the issue in a ragequit fashion but I haven't forgotten about it. Probably I will close it after a good night's sleep.

niner commented 1 year ago

I think @2colours's premise for this issue is very much on point. Perl did one thing better than all the other dynamically typed languages: It retained clarity of what an operation does by using different operators for different operations instead of relying on static typing to make it clear which operation was expected. In Perl, + is a coercive numeric addition. No matter what original types the operands are, they will be treated like a number. This way there can be no surprises like "1" + "1" being "11".

This great design feature however does not square well with object orientation (which came to Perl only later). In OO it's ultimately the object that decides on the operation. In jnthn's words, you need to speak the object's language. Raku being very much object oriented therefore ends up with an impermanence mismatch. Being multi-paradigm unfortunately makes this unavoidable.

I think it's important for the discussion to acknowledge that this problem does indeed exist. It makes language design hard. As this lengthy thread shows, there are often no clear answers.

In such a case it's helpful to have a visible list of commonly agreed upon goals or design principles that any solution must satisfy. Otherwise the discussion will run around in circles and it's personal preference against theoretical musings. There'd be no win in such a situation. A fact that's quite evident when reading this thread.

So, do we agree that the problem exists and that the solution will have to be a compromise? Of yes, what are the must goals?

librasteve commented 1 year ago

I thought I would dig out a reference to Larry's linguistic take on perl (and subsequently, raku, I believe) http://www.wall.org/~larry/natural.html

The piece that I was trying to channel is under the heading "No Theoretical Axes to Grind"

Since English is my mother tongue (sorry), I have a less than objective take. But I think that it is packed with special cases and exceptions. And yet when you are very good at it (I am thinking of Shakespeare, not me), I think that these corners and nuances help to improve the product that you can make using it.

Anyway, that's my case why it's OK to be pragmatic (practical) and not force these useful (for some) exceptions into a set of principles.

librasteve commented 1 year ago

Hi all, I have opened a new problem solving Issue #392 to debate the merits of: Extending Range class to Interval Arithmetic

2colours commented 1 year ago

Then I think it's really time to postpone this issue to the indefinite future; maybe if there is more of a consensus or momentum in considering it an issue in the first place.