Ada-Rapporteur-Group / User-Community-Input

Ada User Community Input Working Group - Github Mirror Prototype
26 stars 1 forks source link

Full relational operators for ordered containers #83

Open Richard-Wai opened 5 months ago

Richard-Wai commented 5 months ago

Interestingly, the ordered container packages define "=" (implicit), ">", and "<" operators for the private Cursor type, but do not define ">=" or "<=". Clearly these could be useful, and the definition seems obvious.

Adding these should be generally painless. I remain curious as to why this was not done originally, and I can't seem to find this mentioned in a relevant AI.

Jeff-Cousins commented 5 months ago

I hadn't queried this as I had assumed, almost certainly wrongly, that they would be implicitly created, the same as for /=.

Jeff


From: Richard Wai @.> Sent: 28 January 2024 9:40 PM To: Ada-Rapporteur-Group/User-Community-Input @.> Cc: Subscribed @.***> Subject: [Ada-Rapporteur-Group/User-Community-Input] Full relational operators for ordered containers (Issue #83)

Interestingly, the ordered container packages define "=" (implicit), ">", and "<" operators for the private Cursor type, but do not define ">=" or "<=". Clearly these could be useful, and the definition seems obvious.

Adding these should be generally painless. I remain curious as to why this was not done originally, and I can't seem to find this mentioned in a relevant AI.

— Reply to this email directly, view it on GitHubhttps://github.com/Ada-Rapporteur-Group/User-Community-Input/issues/83, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATWHQ3XZLDYFCCEXKAAIO2DYQ3AUTAVCNFSM6AAAAABCOO67KSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGEYDIMZVHA3TMOA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

Richard-Wai commented 5 months ago

I hadn't queried this as I had assumed, almost certainly wrongly, that they would be implicitly created, the same as for /=. Jeff

I had assumed the same thing, until I was surprised to find out they weren't there when I tried to use them!

The standard only provides implicit equality for private types, but leaves you on your own for everything else. I'm not sure if this was deliberate or not.. Maybe it was was decided against due to some things potentially strange relationship that might exist, though I can't think of any.. Maybe it was just an oversight..

If it was an oversight, maybe the real solution here would be to add exactly that implicit declaration higher up in the standard.

ARG-Editor commented 5 months ago

Richard writes:

The standard only provides implicit equality for private types, but leaves you on your own for everything else. I'm not sure if this was deliberate or not.. Maybe it was was decided against due to some things potentially strange relationship that might exist, though I can't think of any.. Maybe it was just an oversight..

Huh???

The standard only defines "=" for private types because only "=" makes sense for all types. Most types do not have meaningful ordering operators. No records do, not most arrays, nor any access types.

What would those mean if they were defined by default for a private type? If you required every full type to have them, you would have something that was wildly incompatible (every private type completed with a record would need to add them). If you only made them available if the full type already had them, you would have a major privacy breakage.

For an individual type, they might make sense, but not in general.

A cursor is a reference (speaking generically, not some specific meaning of that), and in general order does not have a meaning for references. I'm dubious that there should be any ordering operators for cursors, even in the ordered containers. I certainly would not have expected any. In any event that discussion is long over.

Automatically creating some of the ordering operators from the others is not possible in general. It is only possible if the ordering defined is complete enough that the basic properties of (">" = not "<=") and (">=" = ">" or "=") are both true. Certainly the results would be complete nonsense if that wasn't true, and it is very easy to create an order function for which that isn't true (any sort of partial order is likely to not meet that). I think that was judged to be too dangerous to do automatically.

I'd guess that ">=" and "<=" functions were left out from the ordered maps as there isn't a requirement that the formal parameters that define ">" and "=" for keys are consistent. The requirement for "<" to define a strict weak ordering would need to be extended somehow to "=" in order for such operations to be defined. A fuzzy definition where both "<" and "=" are True for a some pair of key values is easily possible, and that would make the basic relationship properties nonsense. I believe the ordered container definition carefully avoids using "=" on keys anywhere other than the definition of "=" for the entire container.

Conclusion: All of these "omissions" are entirely intentional, and one could not define these without either letting them have nonsense relationships in many cases. Remember that we can say that the results are nonsense of the user-provided functions don't have a needed property, but that doesn't stop someone from doing it anyway and getting very confused.

                          Randy.
Richard-Wai commented 5 months ago

Huh??? The standard only defines "=" for private types because only "=" makes sense for all types. Most types do not have meaningful ordering operators. No records do, not most arrays, nor any access types. What would those mean if they were defined by default for a private type? If you required every full type to have them, you would have something that was wildly incompatible (every private type completed with a record would need to add them). If you only made them available if the full type already had them, you would have a major privacy breakage. For an individual type, they might make sense, but not in general.

I think you might have misunderstood my meaning. I was not suggesting anything besides "=" should be implict for a private type. I was talking about "<=" being implicit iff "<" is declared, for example. I go on to mostly agree with the rest of what you said which was that I suspect there was a good reason for that, and it seems there was.

A cursor is a reference (speaking generically, not some specific meaning of that), and in general order does not have a meaning for references. I'm dubious that there should be any ordering operators for cursors, even in the ordered containers. I certainly would not have expected any. In any event that discussion is long over.

Huh??? ">" and "<" are defined for Cursor in the ordered containers already! This whole thing is saying, if we have "<", and "=" for Cursors of an Ordered container, why then can we not have "<=", since that is extremely unambiguous.

Edit: To add wrt. Keys. Key equivalence is clearly different than Cursor equivalence, since Cursors already have equality defined for them. So I can't see why you can't have ">=" for Cursors just because you don't technically have "=" for Keys. "=" for a Cursor intuitively means "selects the same element", and that has no chance of ambiguity.

ARG-Editor commented 5 months ago

A cursor is a reference (speaking generically, not some specific meaning of that), and in general order does not have a meaning for references. I'm dubious that there should be any ordering operators for cursors, even in the ordered containers. I certainly would not have expected any. In any event that discussion is long over.

Huh??? ">" and "<" are defined for Cursor in the ordered packages already!

My point above is that I don't think they should be there. But that discussion is long over.

This whole thing is saying, if we have "<", and "=" for Cursors of an Ordered container, why then can we not have "<=", since that is extremely unambiguous.

I explained that, too; the result is very ambiguous. "=" and "<" are operations provided by the user of an ordered container generic. They don't necessarily have a well-defined relationship. Even if we say that they have to be related somehow, such a rule is impossible to enforce, so in practice they have no relationship. The definition of the containers was carefully defined to never use "=" outside of the definition of equality for that reason. (If "<" doesn't define a strict weak order the entire container is likely to fail over dead, but the relationship between "<" and "=" is likely to matter only in a few cases.)

Therefore, if you combine those two operations, you easily could get results that fail one of the usual properties. For instance, you could define "<=" as follows: function "<=" (A, B : Cursor) return Boolean is (A > B); That would be easy. But that result might not be the same as (A < B) or (A = B), and most would find that surprising. And defining it as the "or" would mean that the other usual property isn't necessarily true.

It would be hard to reason about the operators if they were defined, since at least one of the usual expected properties wouldn't be true in general. So they were left out, in order to avoid such expectations/confusions.

                   Randy.
Richard-Wai commented 5 months ago

I explained that, too; the result is very ambiguous. "=" and "<" are operations provided by the user of an ordered container generic. They don't necessarily have a well-defined relationship.

That doesn't seem right. "=" and "<" are provided by the user for Element_Type, not for Cursor. I don't think it makes sense at all to conflate them. Cursors have a clear ordinal relationship with each other, and a clear meaning of equality (unlike Element_Type). We have operations like "Next" and "Previous" for Cursors. ">" and "<" for a Cursor should reflect that relationship, not the Element they refer to. If a Cursor A is > than B, it means that A is reachable through forward iteration starting at Next(B).

joshua-c-fletcher commented 5 months ago

I agree with Richard.

The "<" operation provided to the ordered container generic apply to the elements and are needed for placing the elements in order or for searching for an element in an ordered container .

The "<" and ">" operations on the Cursor relate to where the cursor is pointing in the list, similar to an index. In a doubly linked list, the elements do not need to be sorted, but the cursor still points to a position in the list that could be greater than or less than another position in the list.

"=" for a cursor would be pointing to the same position in the same list.

If we want an implicit definition, though: Would we define "<=" as (not ">") or literally as ("<" or "=") Would we define ">=" as (not "<") or literally as (">" or "=")

I think that's the problem with implicitly defining them in general, since there are different ways to express the typical definition.

The different ways would be equivalent for any normal type, but an implementer could override some operators and not others and end up with an unexpected implicitly defined operator, depending on how it is defined.

For the Cursor, this doesn't have to be an issue since the definitions are not provided by the user. (the operations are not based on anything passed in to the generic, the cursor is defined by the generic package itself.)

I think it would be OK to provide "<=" and ">=" for Cursors, since the meaning is pretty clear (referring to the 'position' the cursor points to in the container)... but it's easy enough to say (not ">") instead of "<=" that there isn't a great loss not having them.

On Thu, Feb 1, 2024 at 1:57 AM Richard Wai @.***> wrote:

I explained that, too; the result is very ambiguous. "=" and "<" are operations provided by the user of an ordered container generic. They don't necessarily have a well-defined relationship.

That doesn't seem right. "=" and "<" are provided for Element, not for Cursor. I don't think it makes sense at all to conflate them. Cursors have a clear ordinal relationship with each other, and a clear meaning of equality (unlike Element_Type). We have operations like "Next" and "Previous" for Cursors. ">" and "<" for a Cursor should reflect that relationship, not the Element they refer to. If a Cursor A is ">" than B, it means that A is reachable through iteration starting at Next(B).

— Reply to this email directly, view it on GitHub https://github.com/Ada-Rapporteur-Group/User-Community-Input/issues/83#issuecomment-1920631004, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4A3SCSLQBUDYPFBU662LIDYRM4HFAVCNFSM6AAAAABCOO67KSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRQGYZTCMBQGQ . You are receiving this because you are subscribed to this thread.Message ID: @.*** .com>

sttaft commented 5 months ago

I would agree that in general, from an interface design point of view, when both "=" and "<" are defined, it makes sense to also have "<=" provided. On the other hand, I would be reluctant to make it automatic the way it works for "/=" as the implicit declaration of "/=" is the source of a fair amount of confusion. Particularly given the availability of expression functions, it seems easy enough for the creator of an interface to include "<=" and ">=" if they think they would be useful.

Now getting to the original question, I would agree that in general we should add "<=" (and ">=") to the language-defined interfaces that already provide "<" and "=" (or respectively, ">" and "="). Whether or not it was an oversight, I think at this point it makes sense to add them.

ARG-Editor commented 5 months ago

Richard Wai writes:

I explained that, too; the result is very ambiguous. "=" and "<" are operations provided by the user of an ordered container generic. They don't necessarily have a well-defined relationship.

That doesn't seem right. "=" and "<" are provided for Element, not for Cursor. I don't think it makes sense at all to conflate them. Cursors have a clear ordinal relationship with each other, and a clear meaning of equality (unlike Element_Type). We have operations like "Next" and "Previous" for Cursors. ">" and "<" for a Cursor should reflect that relationship, not the Element they refer to. If a Cursor A is ">" than B, it means that A is reachable through iteration starting at Next(B).

Guys, RTFM!!! The RM says nothing of the kind.

For an Ordered_Set, "<" is defined by A.18.9(100-101/2):

function "<" (Left, Right : Cursor) return Boolean with Pre => (Left /= No_Element and then Right /= No_Element) or else raise Constraint_Error, Global => in all;

Equivalent to Element (Left) < Element (Right).

The operation defined for Ordered_Set uses the Element "<". It has nothing to do with the Next or Previous operations. (The Ordered_Maps use the Key "<", this brings up the same issues.)

Similarly, I cannot find anything that says that "there is a clear meaning of equality" cursors. In particular, there is nothing in RM that says that two independently derived cursors necessarily are equal if the designate the same element. In particular, cursors might contain extra data used to detect dangling cursors, and that data does not necessarily match for cursors that are created independently. So while if you have two cursors A and B, if A = B then they necessarily designate the same element, but if A /= B, they still might designate the same element.

Because cursor "=" wasn't intended to be a useful operation, one cannot usefully define operations like ">=" or "<=" on cursors, either. You would not want a situation where A > B, A < B, and A = B are all False, but that is possible given the definition of "=".

I suppose if there is enough interest, we could add a requirement in A.18 that "=" on cursors means exactly that the two cursors designate the same element. But that is a much wider issue than just some random ">=" operation that would hardly ever be used. And it would require determining that the implementation impact is acceptable. (Remember that language-defined types have to properly compose, 4.5.2(32.1/1), which does not happen for untagged types like Cursors. So just providing a user-defined "=" is not necessarily enough to allow such types to meet requirements on "=".)

The operation Richard describes would make sense, but it is NOT the one defined by the Ada containers. As such, "<=" and ">=" don't make sense for these operations.

Tucker writes:

Now getting to the original question, I would agree that in general we should add "<=" (and ">=") to the language-defined interfaces that already provide "<" and "=" (or respectively, ">" and "="). Whether or not it was an oversight, I think at this point it makes sense to add them.

It wasn't an oversight, and they do not make sense for the containers as defined, as detailed above and all week. Unless we make a major change in the definition of the containers, I will oppose any such change. (I am speaking personally here, not as part of my role as ARG Editor.)

nholsti commented 5 months ago

Randy (ARG-Editor) said 'The operation defined for Ordered_Set uses the Element "<". It has nothing to do with the Next or Previous operations.'

That sounded weird to me, but when I check the text of A.18.9 (Ordered_Sets) I cannot find any connection between the "successor" relation, defined in para 81/3 using "<" for Element, and the function Next. I expected that the Next function applied to a Cursor would return a Cursor for the successor element, and vice versa for Previous and the "predecessor" relation; indeed there is such a definition for Previous in 93/3.

The Iterate operations do work according to the "successor" and "predecessor" relations.

A.18.6 (Ordered_Maps) has definitions connecting Previous to "predecessor", in 74/3 and 74.2/5, but again the corresponding definitions for Next and "successor" seem to be missing.

A.18.3 (Doubly_Linked_Lists) has such definitions both for Next/"successor" and Previous/"predecessor", but of course these do not involve any "<" relations.

It seems that the missing definitions for the Next operations should be added to A.18.6 and A.18.9.

sttaft commented 5 months ago

On Sun, Feb 4, 2024 at 3:07 AM ARG-Editor @.***> wrote:

...

Similarly, I cannot find anything that says that "there is a clear meaning of equality" cursors. In particular, there is nothing in RM that says that two independently derived cursors necessarily are equal if the designate the same element. In particular, cursors might contain extra data used to detect dangling cursors, and that data does not necessarily match for cursors that are created independently. So while if you have two cursors A and B, if A = B then they necessarily designate the same element, but if A /= B, they still might designate the same element.

18.4(18/5), Maps, says: The primitive "=" operator for type Cursor returns True if both cursors are No_Element, or designate the same element in the same container. 18.7(17/4), Sets, says: The primitive "=" operator for type Cursor returns True if both cursors are No_Element, or designate the same element in the same container.

So I do think "=" is well defined on Cursors.

...

The operation Richard describes would make sense, but it is NOT the one defined by the Ada containers. As such, "<=" and ">=" don't make sense for these operations.

I think they do make sense for ordered containers, given that you are not ever going to have two sets with equal elements that are not the same element, and you are not ever going to have two maps with two elements having the same key that are not the same element.

Tucker writes:

Now getting to the original question, I would agree that in general we should add "<=" (and ">=") to the language-defined interfaces that already provide "<" and "=" (or respectively, ">" and "="). Whether or not it was an oversight, I think at this point it makes sense to add them.

It wasn't an oversight, and they do not make sense for the containers as defined, as detailed above and all week. Unless we make a major change in the definition of the containers, I will oppose any such change. (I am speaking personally here, not as part of my role as ARG Editor.)

Do you still feel that way given 18.4(18/5) and 18.7(17/5) which do give a clear definition of "=" on Cursors?

-Tuck

Message ID: @.*** com>

ARG-Editor commented 5 months ago

Niklas Holsti:

That sounds weird, but indeed when I check the text of A.18.9 (Ordered_Sets) I cannot find any connection between the "successor" relation, defined in para 81/3 using "<" for Element, and the function Next.

Next, like most of the common operations, is defined in A.18.7. In particular A.18.7(76.2/5):

Returns a cursor designating the successor of the node designated by Position in Container.

What I missed is any direct connection between "successor" and "<". I generally think in terms of the general Map and Set designs (A.18.4 and A.18.7) and don't want to depend on the characteristics of a particular implementation. So I don't think about the details of the specific containers very much.

Note that Map is like Set this way.

                  Randy.
ARG-Editor commented 5 months ago

Tucker wrote:

A.18.7(17/4), Sets, says: The primitive "=" operator for type Cursor returns True if both cursors are

No_Element, or designate the same element in the same container.

So I do think "=" is well defined on Cursors.

Thanks, that's what I couldn't find. I am surprised it exists, since it is unimplementable in Ada 2005 when this was originally defined (it would be impossible to meet this definition AND the requirements of 4.5.2 on user-defined operations). Since untagged record equality now composes, that is no longer an issue.

I don't actually think this is well defined, as it doesn't say anything about other cases. As this is written, a function "=" that always returned True would meet this requirement. Usually, when we define Boolean functions, we say something like "... and returns False otherwise", which is missing here. For instance, look at the nearby wording for Has_Element. Someone could argue that the difference is actually significant in some way. This should say something like: The primitive "=" operator for type Cursor returns True if both cursors are No_Element, or designate the same element in the same container{, and returns False otherwise}.

[No, I don't believe anyone would actually buy the above argument, but even so, we should fix this wording. Especially as so many other functions include the "returns False otherwise" phrase.]

...

The operation Richard describes would make sense, but it is NOT the one defined by the Ada containers. As such, "<=" and ">=" don't make sense for these operations.

I think they do make sense for ordered containers, given that you are not ever going to have two sets with equal elements that are not the same element, and you are not ever going to have two maps with two elements having the same key that are not the same element.

This seems confused. You certainly can have "two sets with equal elements" (why not? Just make a copy of a set and that is true). I think you meant to say something about a single set with two different elements that are equal (equivalent). That seems to be impossible by construction (although I can't find a clear statement of that anywhere).

Now getting to the original question, I would agree that in general we should add "<=" (and ">=") to the language-defined interfaces that already provide "<" and "=" (or respectively, ">" and "="). Whether or not it was an oversight, I think at this point it makes sense to add them.

It wasn't an oversight, and they do not make sense for the containers as defined, as detailed above and all week. Unless we make a major change in

the definition of the containers, I will oppose any such change. (I am speaking personally here, not as part of my role as ARG Editor.)

Do you still feel that way given 18.4(18/5) and 18.7(17/5) which do give a clear definition of "=" on Cursors?

Well, yes, but now it is more of a philosophy argument. I don't believe that anyone should be using operations not defined for all of the Sets (re Maps).

The only reason to use an Ordered_Set (or Map) is because it is easier than using the hashed versions; the performance is almost always worse. It's easier because you probably have a "<" lying around somewhere, while you would have to write a hash function.

I can understand using the "easier" implementation as it would be premature optimization to insist on the "best" implementation even if doing so doesn't meaningfully change the performance of the program. But I can't understand depending on unique characteristics of that subpar implementation as that simply makes it difficult to change to the better implementation if/when that makes sense. Locking yourself into a dubious implementation makes no sense.

There are a few unique operations of the various container implementations that cannot be provided any other way, and so I understand including them in the packages even if they ought to be avoided in the normal case. But anything that can easily be written yourself and is unique to a particular kind of container is just adding clutter and the temptation to use it even though the best practice is to not do so.

These operators are clearly in that category -- they do not belong in this package (Ordered_Sets and Ordered_Maps as well) to begin with. Adding more of them is just adding more clutter and adding more of an attractive hazard. I see no reason to do that.

          Randy.

P.S. The above is again personal opinion and not in any official capacity.

sttaft commented 5 months ago

This should say something like:

The primitive "=" operator for type Cursor returns True if both cursors are No_Element, or designate the same element in the same container{, and returns False otherwise}.

Agreed, or use "if and only if".

I think they do make sense for ordered containers, given that you are not ever going to have two sets with equal elements that are not the same element, and you are not ever going to have two maps with two elements having the same key that are not the same element.

This seems confused. You certainly can have "two sets with equal elements" (why not? Just make a copy of a set and that is true). I think you meant to say something about a single set with two different elements that are equal (equivalent).

Yes, I meant to say "to have a set with two equal elements that are not the same element."

That seems to be impossible by construction (although I can't find a clear statement of that anywhere).