Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Allomorphs are an interface drug #404

Open 0rir opened 11 months ago

0rir commented 11 months ago

%*SUB-MAIN-OPTS<coerce-allomorphs-to> seems to address part of a problem. That problem is that allomorphs are only great when you need them, which is likely an external interface. When writing strictly typed OO they can be an unwanted type.

no allomorphs, use coerce-allomorphs numerically, and use coerce-allomorphs Str are hints to possible changes, while use allomorphs and no coerce-allomorphs could restate current behavior.

Just one scenario:

Given a bunch of multi X( Int $a) {...}; multi X( Str $a) {...} a pragma around them, and/or upstream in the data flow, could be a concise fix for multiple ambiguous calls.

In simple cases, a pragma might alter my @a = < 1 2.1 3> to either side. It is a coarsely grained approach to the generalized task.

librasteve commented 11 months ago

yeah, allomorphs are a shady area, I agree

imo, you have this situation:

since we want to adhere to the Liskov Substitution Principle (LSP), then an IntStr Allomorph is Int and is Str since it is formally a child of Int and Str ... so actually it is well behaved in a strict type system

this best I can think of to weed out Allomorphs (if you want to suppress this flexibility) is this:

multi frob( IntStr $a) { nextwith $a.Int };

or maybe

subset NAllo of Int where * !~~ Allomorph;
multi frob(NAllo $a) { dd $a }

I do get that a coercion in the form Int(Allomorph) or maybe Int() would be nicer instead of a subset (or where clause), but I also think this would break the LSP (which is a facet of strict typing / SOLID principles after all) so I can understand that raku is compliant to that model

PS. fwiw X is not a good name for a sub

2colours commented 11 months ago

since we want to adhere to the Liskov Substitution Principle (LSP), then an IntStr Allomorph is Int and is Str since it is formally a child of Int and Str ... so actually it is well behaved in a strict type system

Could be that I'm misunderstanding LSP but I think Allomorphs actually violate it, quite inevitably. <0e10> represents a NumStr which is a Str. That means I can substitute it in for any part of the Str "interface", right? So I perform an empty check by calling .so and (wrongly!) conclude that it's empty. Then for good measure I call .chars and suddenly get 4. How can a string both look empty and have 4 characters?

It goes both ways: 0e10.chars would report a textual length of 1 so I could assume that something that is a number and numerically equal to 0e10 has the same length - but as we have seen, <0e10>.chars has a length of 4.

You could say that this is okay because the methods exist at least; I would say it's actually worse than not inheriting because it seems like you have a compatible interface and then it does something else.

Anyway, I'm not sure of the issue is about this.

jubilatious1 commented 11 months ago

@0rir wrote:

That problem is that allomorphs are only great when you need them, which is likely an external interface.

So Allomorphs are really heteromorphs?

😊

librasteve commented 11 months ago

.so is not a check for "empty" it is a check for truthiness and surely the number zero (however you write it) is not True in raku

in your example, the Str interface is not violated (imo) since .so and .chars return valid and correct results, but it is honoured in a way that represents the weight of meaning in what you have

I agree that Allomorphs can be inconvenient as the OP says and so proposed a couple of ways that your raku API can reject Allomorph types, but that is somewhat of an uphill struggle (as it should be)

2colours commented 11 months ago

.so is not a check for "empty" it is a check for truthiness

for strings, yes, it absolutely is.

0rir commented 11 months ago

@librasteve says:

[rejecting] Allomorph types ... is somewhat of an uphill struggle (as it should be)

Why should it be a struggle? I find that at odds with explicit foundational ideas of the sisters, e.g. make the difficult easy.

Incremental typing is one of the features that attracted me to Raku. In my learning exercises, I have bumped into the problem of allomorphs in literal composite structures interfering with the typing I had designated. Understanding the basic problem, it is easiest to loosen the typing. But that delays interacting with more subtle problems with typing. That aspect can be improved by other strategies than I have offered here.

I appreciate the convenience of Raku's allomorphs, this is first time I've writtenmy $fu = IntStr( 1, "" ),

With further reflection, I think my OP "coerce" suggestions to be poor. I still see the 'allomorph' pragma as useful.

jubilatious1 commented 11 months ago

Nota bene: R-programming language does the following (coercion rules) ...

  • Logical values are converted to numbers: TRUE is converted to 1 and FALSE to 0.
  • Values are converted to the simplest type required to represent all information.
  • The ordering is roughly logical < integer < numeric < complex < character < list.
  • Objects of type raw are not converted to other types.
  • Object attributes are dropped when an object ... .

https://www.oreilly.com/library/view/r-in-a/9781449358204/ch05s08.html

Maybe useful?

librasteve commented 11 months ago

@2colours - I stand corrected that .so tests a Str for emptiness - til

thus we have:

> ''.so       #False   (Str)
> '0'.so     #True    (Str)
> 0.so       #False   (Int)
> 0e10.so #False   (Num)

so if you load an Allomorph with a corner case, then what is the "right" thing to do when you apply .so?

should it do the String thing or the Numeric thing?

thus, as you pointed out ...

<0e10>.so #False

I think that the main points of my case remain intact: (i) that the Allomorph class in raku is correct to emphasize the numeric properties since it can be imagined that the number is the active value and the string is a record of the literal that was parsed originally (this is consistent with the behaviour of ACCEPTS) (ii) provided .so returns a Bool, the LSP is not violated since this is what the API defines (iii) this is properly documented Returns False if the invocant is numerically 0, otherwise returns True. The Str value of the invocant is not considered. https://docs.raku.org/type/Allomorph#method_Bool

librasteve commented 11 months ago

@jubilatious1 - in raku True == 1 and False ==0, and even True ~~ Int #True so I see the parallels with R (and perl)

in a similar vein, the Type Graph here https://docs.raku.org/type/Numeric#typegraphrelations shows how eg. Num ~~ Real #True and Num ~~ Complex #False which is a somewhat different mapping based more on common sense such as 'a Real number is not a Complex number'

now I guess that one can argue that a Real number is a Complex number (i.e. when the factor of i is 0) and one can argue that if it is Real it is not Complex ... so raku has taken a more operational approach like "if it is Complex then I have to calculate two factors when I do math operations and I don't want to encumber Real math with this overhead"

that's what I call a language design decision and I'm happy to live with Larry's judgement on that

librasteve commented 11 months ago

@0rir - you ask "Why should it be a struggle? `

One of the stated design values that raku has generally is that it nudges us to code in a better way[*] Which is the sort of strict interpretation of make the easy things easy and the hard things possible.

So when you have a class inheritance, the best practice is easy:

class Parent {}
class Child is Parent {}

sub fn(Parent $x) { ... }  
Child.new.&fn;   #accepts Child since Child ~~ Parent

So the easy path is that the LSP is honoured. Alternatively, it is (justifiably) more of a struggle to selectively reject some and not others like this:

class Parent {}
class Child is Parent {}

subset AdultsOnly of Parent where * !== Child;
sub fn(AdultsOnly $x) { ... } 
Child.new.&fn;   #rejects Child

Why justifiably?

Since tbh you may want to consider using role composition instead of inheritance if you want some of your children to be selectively rejected.

[*] Another example of this is the overhead needed to write a Proxy private get/set private attribute (since it is better style to go via a settor method).

jubilatious1 commented 11 months ago

@librasteve

in raku True == 1 and False ==0, and even True ~~ Int #True so I see the parallels with R (and perl)

in a similar vein, the Type Graph here https://docs.raku.org/type/Numeric#typegraphrelations shows how eg. Num ~~ Real #True and Num ~~ Complex #False which is a somewhat different mapping based more on common sense such as 'a Real number is not a Complex number'

So the R programming language is a vector-based language. There are no scalars. Anything you might think of as a scalar is actually a vector of length == 1. What I posted above (The O'Reilly link to R in a Nutshell, 2nd Edition are coercion rules. Sorry if I wasn't clear.

@0rir specifically started this Issue (I think) to discuss coercion. I'm not even sure my Rakudo is up-to-date on the latest coercion pragmas. But this is what the R-programming language does (in the R-Console i.e. REPL), using the ubiquitous c(...) combine function (coercer):

> c(FALSE,TRUE,NA)
[1] FALSE  TRUE    NA
> c(0,1,NA)
[1]  0  1 NA
> c(0,TRUE,NA)
[1]  0  1 NA
> c(0,1.1,NA)
[1] 0.0 1.1  NA
> c(0,"1.1",NA)
[1] "0"   "1.1" NA   
> 

"This is a generic function which combines its arguments. The default method combines its arguments to form a vector. All arguments are coerced to a common type which is the type of the returned value, and all attributes except names are removed." https://search.r-project.org/R/refmans/base/html/c.html

R decides that every vector can only be of one type, generally coerced with c(...). Above are the default coercions. You can do specific coercions such as as.numeric(...) but you risk losing data:

> as.numeric(c("-.1"," 2.7 ","B"))
[1] -0.1  2.7   NA
Warning message:
NAs introduced by coercion 
> 

Raku has obviously made different design choices (e.g. multiple Types within the same List/Array), which one might consider insanely brilliant. But it does beg the question as to how to coerce properly.

librasteve commented 11 months ago

cool - I guess we can Close this issue now... (I can't do that)

jubilatious1 commented 11 months ago

@librasteve Sorry if I wasn't clear, I posted R code.

Seems like discussion on Raku Allomorphs should continue?

librasteve commented 11 months ago

@jubilatious1 - I understand the OP to be requesting some pragmas to squish Allomorph behaviour in raku body code

I am fundamentally against pragmas ("the best part is no part")

So I have convincingly made the case that the current Allomorph behaviour is the best design balance between (i) having Allomorphs at all, (ii) have pragmas to turn them on and off, (iii) having them as they are now where a (small) struggle is needed to suppress LSP if you don't like them

It has been an interesting discussion - and thank you for the R references

According to the OP With further reflection, I think my OP "coerce" suggestions to be poor. I still see the 'allomorph' pragma as useful.

So since there are 200 open Issues here, I propose that we agree (or at least agree to disagree) and close this one.

2colours commented 11 months ago

so if you load an Allomorph with a corner case, then what is the "right" thing to do when you apply .so?

should it do the String thing or the Numeric thing?

I think diamond inheritance (where you really inherit different behavior for the same apparent interface) just violates the Liskov Substitution Principle. It cannot be helped. There is a reason why other languages aren't so fond of something like Allomorphs in Raku.

When you refer to the documentation of .so for Allomorph, you implicitly acknowledge that the documentation of Str (which NumStr is) is not met.

(ii) provided .so returns a Bool, the LSP is not violated since this is what the API defines

If all your API is "give me a value, and I give you a Bool", that's not going to work. You need to know what meaning the return value has. If you don't know that, you shouldn't call it. By the way, if LSP was really just about the return type, that would mean that in a statically typed, compiled language like Java you couldn't ever violate LSP, even if you tried... I hope we can see that this is completely untenable.

2colours commented 11 months ago

So I have convincingly made the case that the current Allomorph behaviour is the best design balance between (i) having Allomorphs at all

Well, this is the thing. "Having Allomorphs at all" is not a noble goal at all, it is a trap. The least we can have them, the better. The compromise should be at least understood from this angle.

Give people the chance to get rid of Allomorphs.

I don't get this urge that you want to one-sidedly close an issue when no agreement has even been reached with the person who opened it in the first place. And as an issue, it will clearly stand even if you get them to agree. The violation of the LSP and the practical implications (that have bit me at least twice) are factual.

librasteve commented 11 months ago

As a process, I don't get that there is an appetite for ongoing long winded back and forth that serves as a pure talking shop. I would like a world where we have an open exchange of views, that concensus (at best) or a majority of the elders (at worst) makes a timely assessment and adopts or rejects.

Thus my (admittedly somewhat crass) push to close this thread. btw, I am very specifically NOT an elder in this.

On the subject matter, there is no chance that those who want to drop Allomorphs from raku will succeed. OK the arguments have been aired. But they only have minority support.

In my opinion, which I hope is shared, Allomorphs are one of the peak achievements of raku. They are the result of Larry's asking of the question "how do I get the same flexibility of perl (or untyped raku) Str<->Int behaviours when types are fully applied?"

This is akin to "can I invent a type system that can apply full type controls without killing the ability to have these complex Str<->Int behaviours?" Allomorphs square this circle and evidence this non-trivial achievement of raku.

As you know, the raku MAIN CLI syntax maps to Allomorph literals like <1 John 5/7> ... so there is a combining of this Allomorph concept with a new solution to CLI. Now I can have raku Num literals like 0e10 in my CLI args - can you so easily feed a float to a CLI in other languages?

More widely, Allomorphs themselves are built from the core raku class system ... see the TypeGraph link above ... so they are an exemplar of how you can build your own mixed types with role composition, inheritance, mix-in and hybrid behaviours.

I agree that Allomorphs come at a cost. They have some quirky angles that may cause a wtf moment for the Python/Java crowd ... and therefore it takes a while for a raku newcomer to grok what is going on. But raku is not that, it's a (big) step up.

0rir commented 11 months ago

@jubilatious1:

@0rir specifically started this Issue (I think) to discuss coercion.

No and no allomorphs was my first proposal. %*SUB-MAIN-OPTS<coerce-allomorphs-to> was referenced only to show there as been prior art along this line.

2colours commented 11 months ago

I think majority decision is overrated when there is factual evidence about a certain problem that does arise in code - however, we aren't talking about majority decision here. There is one person claiming majority and insisting that no change can/should be done. It seems rather ironic when this isn't even a majority opinion in this thread at the moment.

It has clearly been demonstrated that Allomorphs should be used sparingly - e.g because we would have to either accept that these SomethingStr types are in fact not Str, or basically have a useless Str type that can have whatever behavior through its interfaces.

Also, I think it has been pointed out in a couple of discussions that allomorphs mostly solve a nonexistent problem because Raku supports a lot of operations of given types via coercions. Keeping input as string and then using it in "numeric context" is just fine, in the vast majority of cases you wouldn't even need to write an explicit coercion.

All in all, it's not just that it gets newcomers confused (although that should be alarming in itself) but that by the time they notice there is anything non-trivial happening, they might have already written code that something that they never asked for sabotages. They might already have faulty cache lookups to values that did look like integers, or they might have bool checks with alternating semantics. There seems to be no just reason for it, or no precedent that could be used for a sanity check.
To just say "no problem, you just need to get used to it" doesn't feel useful or helpful in this case. It would be more appropriate to make it opt-in, or at least create an opt-out option that can be suggested to people who just want to feel safer, not paying for something they don't even want to use.

librasteve commented 11 months ago

On reflection, where there is a request to disable an important feature of raku, I would ask for a super majority (let's say 80% of the core devs) where any of the following is true: (i) it has been a core feature of the language for many years, (ii) it is clearly defined in the language design docs, (iii) it is well documented, (iv) it is widely used in user / module code and (v) it is a large effort to change the raku source.

Please consider this code:

role NotSo {
    method so { False }
}
my $nss = 'a' but NotSo;

sub check-empty(Str $s) {
     say "This Str, $s, is empty!" unless $s.so 
}
check-empty($nss);

You will note that no Allomorphs are involved. Yet the OP and subsequent concerns that the Str check can be defeated and the '.so' API is not honoured exist here, no?

The point is that it is easy for external raku calling code to throw something that is not quite a Str at your functions.

And, while the easy path is to take anything that is self declared a 'Str', it is not too hard to control this like this:

subset TightStr of Str where ~*.^name eq 'Str';
say $nss ~~ TightStr;     #False

I can think of many problems that can be solved with Allomorphs (for example keeping the original text when you parse a csv for numeric info) - and even in your use case, keeping all Numerics as Str and then coercing them when needed means that you cannot then use types to control eg. a column of numbers and a column of names.

I would say that Allomorphs do raise advanced issues such as inheritance or mixins, and I have already agreed that newcomers should be helped with better materials. But I do not agree that these techniques should be banned, or that Allomorphs should be turned off.

2colours commented 11 months ago

On reflection, where there is a request to disable an important feature of raku, I would ask for a super majority (let's say 80% of the core devs) where any of the following is true: (i) it has been a core feature of the language for many years, (ii) it is clearly defined in the language design docs, (iii) it is well documented, (iv) it is widely used in user / module code and (v) it is a large effort to change the raku source.

This is your opinion, and I heavily disagree with this opinion but the fact is, there is no such process at the moment.

Please consider this code:

That code seems like DIHWIDT. Raku allows you to do a lot of things that you wouldn't do if you want to live safe. However, opting into unsafe, mind-boggling code is a bit different than being forced to use them and not even having a clear way to eliminate them.

And, while the easy path is to take anything that is self declared a 'Str', it is not too hard to control this like this:

Yes... basically acknowledging that LSP is not kept and inheritance has no semantic meaning.

I can think of many problems that can be solved with Allomorphs (for example keeping the original text when you parse a csv for numeric info)

In these vague terms, just keeping them as strings and relying on numeric operators seems perfectly sufficient.

and even in your use case, keeping all Numerics as Str and then coercing them when needed means that you cannot then use types to control eg. a column of numbers and a column of names.

The elephant in the room: who is going to guarantee that 1. a column will only contain numbers 2. a column will never contain something that can be parsed as a number?

If these answers are sufficiently clarified then yes, it can be easily detected whether a coercion succeeds or not. Or you can just parse the columns as appropriate, even.

0rir commented 11 months ago

If you get "well formed" IntStrs coming to you in typed data for massage, should you preserve the typing or assume it is just a convenience subtype of Int?

librasteve commented 11 months ago

dear @0rir , i suggest you take a glance here at about 11:30 … my personal approach would depend on the use case for the script … if it’s a throwaway, who cares (just do the untyped raku defaults and ignore the subtleties of IntStr). If you want to bring in types, then I would say Str is the best for an initial scan of data-in-the-wild (think of quite poorly defined csv columns from eg excel or maybe screen scraped). Then you can try to coerce to IntStr (or NumStr) eg a numeric column to … this lets you detect anomalies without a hard reject and (if this is important to you) then you can handle these exceptions as you go. And finally a hard coerce to eg. Num before you do real math such as with Dan where you can choose a NaN maybe where you fail to convert (eg an excel formula).

0rir commented 10 months ago

Sorry, to waste our time. I was not clear that Raku data is to be massaged and forwarded onward. By well-formed, I meant that no custom built IntStrs like IntStr.new( 1_000_000, "1000") are included. The types to be formed contain Ints. Should I say IntStrs are Ints and pass them through?

When do you use IntStrs? Do you ever care that they are larger than Ints?

After you have filtered your input and have your datasets purged of XxxStrs, why do you need those classes--when an unknown type error would be sufficient?

2colours commented 10 months ago

Yes, I think we are very much on the same page... the problem with something like IntStr is that the promise is that you don't get a third, brand new thing. You are supposed to get something that is absolutely an Int if you want to, and absolutely a Str if you want to. However, since the interface of Int and Str contradict, this is just not possible and you end up casting anyway. Or worse, you don't cast and are stuck with data that isn't fully compatible with either of the two.