Null elements in collections

eclipse-archived / ceylon

The Ceylon compiler, language module, and command line tools

http://ceylon-lang.org

Apache License 2.0

398 stars 62 forks source link

Null elements in collections #5009

Closed CeylonMigrationBot closed 8 years ago

CeylonMigrationBot commented 12 years ago

[@ikasiuk] When designing Ceylon collections we implicitly made a decision that might be worth discussing explicitly. null values in collections are not completely unproblematic. In particular there are two problems:

If first or last return null then that's ambigous: it can either mean that the collection is empty or that it contains a null element. I've actually seen, and admittedly even written, code that erroneously assumes that !exists lst.first means that the list is empty. Similarly, item returning null does not necessarily mean that the index was out of bounds.
The definition of equality for collections with null elements is somewhat tricky because we can't compare something to null but we can compare it to {null}. That was actually one of the reasons why equals was initially not defined in Object.

There are two solutions for this. The first one is the one wee took:

Allow null elements in collections.
Just live with the ambiguity of first, last and item but introduce a special object (exausted) to make the value returned by Iterator.next() unambiguous.
Define that {null}=={null}.

And this is the second one:

Disallow null elements in collections but introduce a special nullElement object that represents null elements in collections.
first, last and item are automatically unambiguous: !exists lst.first always means that the list is empty. Iterator.next() returns Element?, no special object required.
Equality for collections is consistent and straight-forward (because nullElement is a normal object).
The Array class probably needs to map between null (on Java/JS side) and nullElement (on Ceylon side). I used a similar technique when implementing a Ceylon wrapper for Java maps because Java maps can have null values, in contrast to Ceylon maps. It seems to work quite well.

I'm not sure which of the two solutions is better. They both have advantages as well as downsides. So I'm not really suggesting to change anything, but it would be nice to hear some opinions.

[Migrated from ceylon/ceylon.language#131] [Closed at 2013-09-28 01:02:36]

CeylonMigrationBot commented 12 years ago

[@gavinking] > I disagree completely. Any implementation of equals that has optional attributes will end up doing just that.

That's a completely fucking broken Java-developer-concept of what equality means. This comes down to the whole "unique key" debate. Christian Bauer and I had to write a whole book chapter explaining this to Java folks with their totally impoverished conceptual framework for reasoning about this stuff. "Equality" does absolutely not mean that the two objects have all the same field values! A correct implementation of equals should compare the required unique keys of the object. For example, for Persons:

Correct definition of equality:

Two Persons are equal iff they have the same ssn.

Totally fucking broken and bogus definition:

Two Persons are equal iff they have the same name, address, ssn, birthday, pets, and bankBalance (where "same" means equal or both null).

The data modelling community understands all this stuff since ooooh, I guess like the 1970s. In 2012 the Java community is still deeply confused about it.

CeylonMigrationBot commented 12 years ago

[@ikasiuk] sounds like a topic for the FAQ

CeylonMigrationBot commented 12 years ago

[@RossTate] Wow, I missed out on a lot while I moved to Cornell. Lemme try to catch up:

Here are two practical uses of null in maps that I can think of:

Map<String,String?> parseArgs(String argString)

parseArgs takes a string like "width=5in,keepaspectratio" (which you may recognize if you use LaTeX) and turns it into ["width" -> "5in", "keepaspectratio" -> null]. Mapping to null is different than mapping to, say, the empty string. It indicates to enable that option and (if it has a value) use that default value.

As for an application of Map<A?,B>, this can be useful if you want a default value for keys. That is, if a has no mapping, then use whatever value null maps to. Sure there are better ways to do this, but those might require a lot of work (like a variant of Map that has a settable default and never returns null), so if I were prototyping or simply trying to write a quick simple program, Map<A?,B> is what I'd want to use.

We should be able to accommodate such applications. If we don't do it with null, then we should provide an Option or Maybe type and give it the suitable definition for equality.

I mean, this is the whole reason SQL has ternary logic. Because they understand this stuff slightly better than most programmer-type folks. (Ternary logic has its own problems, of course.)

Yet we're not using ternary logic, so it doesn't necessarily make sense to adopt a solution from a ternary-logic-based system. Besides, the use of null in databases has more to do with indicating the absence of a value of a field (i.e. encoding subrecords). A join, then, only matches rows that actually have a value of the appropriate column (i.e. not null) and have the same value.

This makes me think that the big issue going on is that, in Java, there are two uses of null: one to indicate the absence of a field or value, the other to intentionally add an extra value to the type. @gavinking and @ikasiuk seem to only consider the former to be valid, whereas @FroMage and I consider both uses to be valid. Honestly, in a language that doesn't make sum types easy, both uses are valid. So, @gavinking and @ikasiuk, I get your perspective, but to support it you really need to provide a means to encode the other use of null that @FroMage and I are taking advantage of, i.e. add a Option or Maybe type and make it clear when these should be used instead of null.

what if Nothing refines Boolean equals(Object? other) { return !exists other; }? then you can do == as long as you have an optional type on the LHS (the RHS can have an optional or non-optional type)

I imagine this would cause problems with Java compatibility, since null cannot have any methods. That's just off-the-top-of-my-head speculation though.

But how should you access a value of type T? without using if(exists)? So I guess we can't warn about that, simply because there's no better choice.

You require the programmer to impose some constraint on T that indicates null cannot inhabit T. If the programmer doesn't like having that constraint, then they should use Option<T> or Maybe<T> instead. Otherwise there's almost certainly a bug in the code.

CeylonMigrationBot commented 12 years ago

[@quintesse](Posts just crossed: this is a reply to Gavin/Ivo above) Well, that's not entirely true of course, the data modeling community just delegates the responsibility of determining equality to some other part of the system, because who decides that a new ssn is needed? Exactly, somebody analyzing a certain set of attributes to determine the uniqueness of new / duplicate incoming data.

CeylonMigrationBot commented 12 years ago

[@gavinking] > Exactly, somebody analyzing a certain set of attributes to determine the uniqueness of new / duplicate incoming data.

Huh? This is surely the role of the data modeller, no?

CeylonMigrationBot commented 12 years ago

[@gavinking] > parseArgs takes a string like "width=5in,keepaspectratio" (which you may recognize if you use LaTeX) and turns it into ["width" -> "5in", "keepaspectratio" -> null]. Mapping to null is different than mapping to, say, the empty string. It indicates to enable that option and (if it has a value) use that default value.

But this to me is really a pretty confusing API. I would never use null to represent that an option is enabled. And if I saw that in an API I would be initially very confused by it. The problem here is your signature:

Map<String,String?> parseArgs(String argString)

In Java this would be even much worse because I would see Map<String,String> and so nothing in the signature would warn me to expect to get null values from the Map. (In Ceylon it would be a little better because the type argument String? would give you fair warning to expect it.)

Now, sure, of course I get why you would arrive at that signature. Sure, sure, the argument has no value, so it's value is null. That's perfectly reasonable. And it's certainly perfectly OK to say that if all the Map is representing is a map of argument name to argument value. But what's going on here is that you're trying to cram a little extra information into the Map: the set of arguments that don't have values. But nothing about the notion of a "map" implies to me that it is a priori a necessary part of its responsibility. Some Map APIs might be capable of representing this: Java's can, but badly, as we've seen, and Ceylon's can't.

So why not just use the much clearer signature:

Map<String,ArgValue> parseArgs(String argString)

With class ArgValue() of StringValueArg|EnabledArg. Or if you're being really lazy just Map<String,String|Enabled>. What's the downside? None as far as I can see, and it's certainly much clearer to the user of your API.

As for an application of Map<A?,B>, this can be useful if you want a default value for keys. That is, if a has no mapping, then use whatever value null maps to. Sure there are better ways to do this

Right. Much better ways. It's totally trivial to create a Map<Default|String,String>. Remember, there's nothing magical about null in Ceylon, it's just an object.

Check out this bit of total nonsense equivocation from the contract of java.util.Map:

Some map implementations have restrictions on the keys and values they may contain. For example, some implementations prohibit null keys and values, and some have restrictions on the types of their keys. Attempting to insert an ineligible key or value throws an unchecked exception, typically NullPointerException or ClassCastException. Attempting to query the presence of an ineligible key or value may throw an exception, or it may simply return false; some implementations will exhibit the former behavior and some will exhibit the latter.

Now, again, the situation would be better in Ceylon because you would get fair warning when you receive a Map<String?,String>, and because implementations which don't accept null keys would be able to be declared given Key satisfies Object. But the point is that whatever we do here is going to be better than what Java does.

We should be able to accommodate such applications.

We can, of course. Trivially.

If we don't do it with null, then we should provide an Option or Maybe type and give it the suitable definition for equality.

What for? To save you from writing the following bit of code:

interface Default of default {} object default satisfies Default {}

Really? One line of code?

Note that for default, unlike for null, there's just no question of what is a well-defined notion of equality. Of course default is equal to itself! So the definition of equals() is just the one it implicitly inherits from Identifiable.

CeylonMigrationBot commented 12 years ago

[@gavinking] > Yet we're not using ternary logic, so it doesn't necessarily make sense to adopt a solution from a ternary-logic-based system.

We're not adopting the ternary-logic solution. What I'm saying is that any type system with a null has to decide what null==null evaluates to:

Most programming languages say it evaluates to true, even though that is clearly nonsense, as my example with comparing possibly-null addresses demonstrates. This solution claims to be convenient but some of that convenience is an illusion. I've very often needed to write x!=null && y!=null && x==y in Java, and unfortunately it's not something the compiler warns me about.
Systems with ternary logic say it evaluates to null or unknown. This is much more correct, but unfortunately ternary logic has some nasty consequences where some very intuitive rules of logical reasoning don't hold.
You could imagine a system where null==null evaluates to false, but this would have essentially the same problems as (1), but without its convenience.

Which path to go down?

Well, Ceylon sidesteps the whole issue via the brilliant idea is just saying that null==null is a compilation error. Now, sure, this is a little of a shock to most people, since I don't know of it having ever been done before. In plenty of cases it's going to be a little less convenient than solution (1), but it has the tiny little advantage of being actually correct, and of stopping you from writing certain things that are probably wrong.

This makes me think that the big issue going on is that, in Java, there are two uses of null: one to indicate the absence of a field or value, the other to intentionally add an extra value to the type.

I agree with this characterization. @FroMage wants to use null to add one extra value to the type String. I don't see that as part of the role of Ceylon's null.

@gavinking and @ikasiuk seem to only consider the former to be valid, whereas @FroMage and I consider both uses to be valid.

The question is not "which use is valid" a priori. It's "what is the semantics of null in this language. In this language, null has the first semantic, and not the second.

Honestly, in a language that doesn't make sum types easy, both uses are valid.

Right, and Ceylon arguably makes sum types easier than any other language. Imagine: the compiler is able to reason that the type Integer|String is a sum type (i.e. disjoint) even though the declarations of Integer and String do not in any way refer to each other! I don't know of any other language capable of that.

Precisely because sum types are so easy in Ceylon, the need to have null as a convenient way to add an extra value to a type just goes away.

CeylonMigrationBot commented 12 years ago

[@RossTate] > because implementations which don't accept null keys would be able to be declared given Key satisfies Object

This doesn't seem to be true at all. At least, so far we have provided no means for programmers to be informed that their implementation won't work for null keys. You keep saying it's the programmer's fault for using null keys, but how are they to know it's problematic? Even your solutions above, say with default, seem isomorphic to using null, yet somehow they're less buggy even though nothing in the type system says so. There definitely seems to be a loss of implementation abstraction.

Regarding Option or Maybe, it helps to have standards (do I need to defend that?), and composable ones at that. null is not composable, which is where the issues we're raising are coming from. You get different behavior for T? depending on what T stands for. However, Option<T> will produce the same behavior regardless of what T is. Also, equality and hashing can be defined on it, making it usable as keys for Map.

Right, and Ceylon arguably makes sum types easier than any other language. Imagine: the compiler is able to reason that the type Integer|String is a sum type (i.e. disjoint) even though the declarations of Integer and String do not in any way refer to each other! I don't know of any other language capable of that.

Remember, | does not form sum types. It works fine if you're not writing polymorphic code. The issue is we are writing polymorphic code, so we need something that works for that. Hence Option or Maybe.

CeylonMigrationBot commented 12 years ago

[@gavinking] > That represents a logic bug then, and I don't think it's related to the language at all or to whether two null values should be equal.

The goal of a type system is to prevent "logic bugs". Which kinds of logic bug a type system targets varies from language to language. For example, a language with dependent types attempts to prevent "logic bugs" related to out-of-bounds indexing. Ceylon's type system doesn't. Ceylon's type system attempts to prevent "logic bugs" that relate to use of uninitialized values. Java's doesn't.

Of course, the question arises as to whether it is actually worth it to target a certain class of bugs. You'll always get false positives, where the compiler rejects certain perfectly correct programs. According to the dynamic language community, that means we should just let basically anything through, to eliminate the false positives (along with all the real positives). I think it's clear that practical experience shows that this winds up being very harmful once a program grows beyond a certain level of complexity. Undoubtedly, Ceylon's decision to target "uninitialized value" bugs has a cost in terms of false positives. (Consider the problems we have dealing with circular references in initializers.) It's a totally open question what practical experience with this language will wind up teaching us. So far I think all the fussiness mainly has the effect of pushing us toward clearer, more self-documenting, more robust code. It seems very clear to me that forcing Stef and Ross to write Map<String,String|Enabled> instead of Map<String,String> like in Java is going to make their code more understandable and more maintainable.

CeylonMigrationBot commented 12 years ago

[@gavinking] > This doesn't seem to be true at all. At least, so far we have provided no means for programmers to be informed that their implementation won't work for null keys. You keep saying it's the programmer's fault for using null keys, but how are they to know it's problematic?

I don't understand what you're saying. Check the type constraints on Map. They are very explicit that neither null keys nor null items are not allowed.

Even your solutions above, say with default, seem isomorphic to using null, yet somehow they're less buggy even though nothing in the type system says so.

Wrong. They're definitely not isomorphic. That's what makes the design so damn brilliant.

null does not support == whereas default does.
map[nonExistingKey] might return either null or default, and there's absolutely no ambiguity about what that return value means. This is not the case where you mis-use null as an entry value.

Yes, I understand that virtually nobody in the PL community understands this stuff so it comes as a shock. That's why all programmers should learn relational data modelling. Those guys have a far more robust conceptual framework for reasoning about this stuff.

CeylonMigrationBot commented 12 years ago

[@gavinking] > Remember, | does not form sum types.

It does. X|Y is a sum type if X and Y are disjoint classes or cases of an enumerated type.

The issue is we are writing polymorphic code, so we need something that works for that.

On the contrary we're not. When I'm writing polymorphic code it's absolutely no problem that T actually has the type argument String|Default. I don't even see that. The "problem"—if it's even a problem and I don't agree that it is—only arises when you're writing non-generic code and are too damn lazy to want to write the class Default. All the examples of the "problem" so far are of non-generic code.

CeylonMigrationBot commented 12 years ago

[@RossTate] > I don't understand what you're saying. Check the type constraints on Map. They are very explicit that neither null keys nor null items are not allowed.

I'm saying you've artificially imposed those constraints. One could remove those constraints and write an implementation of Map that doesn't work for nulls, and Ceylon won't say a thing.

It does. X|Y is a sum type if X and Y are classes or cases of an enumerated type.

As in neither of them is a type variable, so you're not using polymorphism.

On the contrary we're not. When I'm writing polymorphic code it's absolutely no problem that T actually has the type argument String|Default.

I'm talking about being polymorphic with respect to String. The issue is that T|Default is not a sum type.

CeylonMigrationBot commented 12 years ago

[@gavinking] > One could remove those constraints and write an implementation of Map that doesn't work for nulls, and Ceylon won't say a thing.

Certainly. Just like Stef can write his precious eq() method and the Ceylon compiler will let him. I have not yet got round to adding the necessary code to the typechecker that analyzes the logic of a method and determines if it is actually an eq() method and erroring haha ;-)

But the question here is not about whether wrong things are expressible within Ceylon's type system—clearly they are. The question is whether or not Map in ceylon.language should be wrong.

The issue is that T|Default is not a sum type.

But I don't think that is the issue. I've not seen anyone trying to abstract over maps of form Map<X,Y|Default>. Certainly if that were the problem, then Maybe<Y> would be potentially useful. But I don't think that's the problem we're trying to solve, is it?

CeylonMigrationBot commented 12 years ago

[@FroMage] > Stef can write his precious eq() method

We've already established that you also write them, more than once, and with bugs ;)

Which kinds of logic bug a type system targets varies from language to language.

Sure, I agree, and I think in this case we're going too far.

It seems very clear to me that forcing Stef and Ross to write Map<String,String|Enabled> instead of Map<String,String>

Actually we're trying to use Map<String,String|Nothing> and argue that null should have the same implementation as your Default or Enabled and support equals.

Note that I predict that while the #4879 feature request will be eq, the #4880 will be to have that Default value be in the SDK. People will just replace Nothing with it since it defines equality, thereby defeating the purpose of all the nice things we have about Nothing.

@FroMage wants to use null to add one extra value to the type String

Wrong, I'm trying to add useful (and expected) behaviour to Nothing. You're right that this is going to be be a surprise (a shock you said) to most people, and I don't see why we do that to them (and us).

That's a completely fucking broken Java-developer-concept of what equality means.

Oh come on, now this is just either a strawman argument or insulting. You give us an example of a DB record, and tell us two objects should be equal if they have the same primary key. Forget for a minute that this would instantly lead to a ton of errors in Hibernate code since detached objects or even over-the-wire objects are not physically equal to DB objects, even though they have the same primary key (a quite frequent cause of error in Hibernate usage).

How the hell would you represent equality for such a type then?

class Foo(){
 shared String? a;
 shared String? b;
}

For me equality is (almost) trivial:

class Foo(){
 shared String? a;
 shared String? b;
 shared actual Boolean equals(Object o){
  if(is Foo o){
   return eq(a, o.a) && eq(b, o.b);
  }
  return false;
 }
}

Forget about the database world for a minute, because there are many real-world examples that don't have the same behaviour.

Look, frankly I'm so convince that I'm right with this that I've let it slide from the beginning and I'm ready to let it slide further and wait for the community to tell us why this is wrong and why we need to change it. I've no doubt that we'll have to change our minds in the end. I just hope it won't be too late.

CeylonMigrationBot commented 12 years ago

[@chochos] Or we can wait until Ceylon 7.0 to add eq just like Java did with Objects.equals(a,b)

CeylonMigrationBot commented 11 years ago

[@quintesse] Moving to M5

CeylonMigrationBot commented 11 years ago

[@tombentley] M6

CeylonMigrationBot commented 11 years ago

[@gavinking] This was an awesome discussion, but we have not gone down the second path suggested by @ikasiuk, though it is, clearly, an internally consistent approach.

P.S. I endorse @chochos' suggestion that we wait until Ceylon 7 to introduce eq() ;-)