dotnet / dotnet-api-docs

.NET API reference documentation (.NET 5+, .NET Core, .NET Framework)
https://docs.microsoft.com/dotnet/api/
Other
712 stars 1.56k forks source link

IComparer and IComparer<T> documentation should note ordering properties required #3587

Open mairaw opened 7 years ago

mairaw commented 7 years ago

@dbjorge commented on Tue Dec 06 2016

The documentation for IComparer (@docs.microsoft.com, @msdn) and IComparer<T> (@docs.microsoft.com, @msdn) do not note whether any particular ordering properties are expected of a valid implementation, but at least List<T>.Sort assumes some (antisymmetry, per this Connect feedback circa .NET 4.5).

The interface documentation should note what sort of ordering properties the rest of the standard library will be assuming/demanding of a Compare implementation. I'd expect this to be the usual properties of a total order.


@CIPop commented on Tue Dec 06 2016

cc: @rpetrusha, @AArnott


@karelz commented on Tue Dec 06 2016

cc: @mairaw


@JonHanna commented on Tue Dec 06 2016

Is that a responsibility of IComparer<T> or of the caller of Sort (or OrderBy etc.). If one was going to implement an order that wasn't total (e.g. XmlSchema's rules on comparing datetimes without time zones, would doing so in an IComaprer<T> be disallowed?


@dbjorge commented on Tue Dec 06 2016

I would consider it reasonable to either:

I think either of those would be a big improvement over not documenting it either way. I could be convinced that the former would be too big a breaking change to be acceptable.

In either case, I think it would improve clarity to also document it on the individual IComparer-based framework methods.


@JonHanna commented on Tue Dec 06 2016

I think the warning is a good idea. I think the idea of documenting the relevant methods, even more so.


@mairaw commented on Tue Dec 06 2016

I own these docs so I'd be the one making the updates. Just let me know who I should be working with to figure out what's the guidance we want to provide there. /cc @karelz


@karelz commented on Wed Dec 07 2016

@dbjorge @JonHanna can you please suggest which text to change to what? (ideally mention the diff or old & new text) Thanks!


@JonHanna commented on Wed Dec 07 2016

Could an extra page be added describing total ordering? A concise statement linking to it would probably be clearer and with less text added to the individual pages affected.


@mairaw commented on Wed Dec 07 2016

@JonHanna yes, it's possible to add a separate article explaining the concepts and then link to it from the affected pages.


@JonHanna commented on Mon Jan 09 2017

Total Order

A total order is a binary relation that has the following properties:

  1. If a <= b && b <= a then a == b
  2. If a <= b && b <= c then a <= c
  3. a <= b || b <= a

In other words:

  1. For any two elements x and y, either we can decide that x should be sorted before y, that y can be sorted before x, or that they are equivalent in sort order.
  2. For any three elements, x, y and z. If we would sort x before y and y before z we would also always sort x before z. The same holds in that if we would consider x to sort equivalently to y and y to sort equivalently to z we would always consider x to sort equivalently to z.

Exceptions to Total Order

It can perhaps be easier to think of total order by considering cases that don’t have total order.

Floating point operators.

Because double.NaN <= double.NaN returns false, double.NaN < 0 returns false and 0 < double.NaN returns false, we cannot use the arithmetic operators to determine a total order on a set of floating point numbers that includes NaN.

XML Schema Datetimes without Time Zones

XML Schema defines some datatypes that are only partially ordered. One example is dateTime which represents a date and time without a timezone (comparable to DateTime in .NET). By the rules of XML Schema, 2001-03-29T12:00:57 is always sorted before 2001-04-23T09:17:39 and 2001-04-23T11:31:09 is always sorted before 2001-04-29T23:09:10 but 2001-04-23T09:17:39 can be considered neither before nor after 2001-04-23T11:31:09.

Not knowing the time zone, it could be e.g. 2001-04-23T09:17:39 in Copenhagen compared to 2001-04-23T11:31:09 in Dublin (earlier), or 2001-04-23T09:17:39 in Montreal compared to 2001-04-23T11:31:09 in Bangalore (later).

To offer a dependable ordering within the international context XML Schema time-related data types consider, one dateTime can only be considered as sorting before another if that holds for any of a possible range of timezones from -14:00 to +14:00 to have full confidence that the ordering would hold.

Importance or Total Order to Sort operations.

Many methods, especially sorting methods, that take an IComparer or IComparer<T> require that the Compare() methods of those objects give a total order. Failure to do so can result in the results not being sorted at all, exceptions, or even infinite loops. Examples include List<T>.Sort(), Array.Sort() and Enumerable.OrderBy().

Other methods may have similar issues, such as those that find minima and maxima, such as Linq's Min() and Max() extension methods.

Providing a Total Order on Partially Ordered values.

If you want to sort values that are not normally totally ordered, you need to define a total ordering for it. For example, as mentioned above double is not totally ordered because NaN cannot be placed into any meaningful position by the <, <=, =, > and >= operators. Comparer<double>.Default, Enumerable.Min() and Enumerable.Max() all solve this problem by adding their own rule that sorts NaN before any other value. This results in the three properties mentioned at the start of this article applying to the ordering.

An approach that works for many cases, is to add a tie-breaker rule onto a partial ordering. This is used here, with most comparisons being the same as with < but the "NaN equal to NaN and before everything else" providing that tie-breaking rule that makes it a total order.


@karelz commented on Wed Dec 14 2016

@ianhays @Priya91 can you please review the text as area experts? (or pull in appropriate folks please)


@ianhays commented on Thu Dec 15 2016

@JonHanna that's a great doc on total order. 👍very concise and informative. The DateTime section is a bit verbose but explains the reasoning behind the exception to total order well.


@JonHanna commented on Thu Dec 15 2016

I split up the sentences and hopefully clarified a bit (sometimes attempts at concision end up reading more verbose than otherwise) and added a sentence at the end about using tie-breaking rules in creating total orders out of parial.


@JonHanna commented on Thu Dec 15 2016

Also fixed the copy-pasting in the example dates, as I had Montreal and Bangalore in time zones over a week apart :grin:


@svick commented on Fri Dec 16 2016

@JonHanna

EqualityComparer<double>.Default, Enumerable.Min() and Enumerable.Max() all solve this problem by adding their own rule that sorts NaN before any other value.

Did you mean Comparer<double>.Default? EqualityComparer does not define ordering.


@JonHanna commented on Fri Dec 16 2016

@svick I did indeed. Fixed.


@ianhays commented on Mon Feb 13 2017

@mairaw did you get a chance to take a look at this?


@mairaw commented on Mon Feb 13 2017

Not yet @ianhays. I'm completely swamped at the moment with the .NET Core release.


@ianhays commented on Mon Feb 13 2017

Gotcha, thanks for the update :)

mairaw commented 7 years ago

Might be something for @GuardRex to work on next. We need to define where we're going to link that from with @ianhays.

guardrex commented 7 years ago

@ianhays @mairaw @JonHanna

[@JonHanna :rocket: Hope all is well with you!]

@ianhays @mairaw -- See if this looks good, or let me know what you want to do:

  1. Title? Total Order (as @JonHanna titled it)
  2. File name? total-order.md
  3. Place the topic at docs/standard?
  4. Link to the topic from the IComparer and IComparer<T> API reference topics?
  5. Do we want to link from methods relying on IComparer and IComparer<T>? If so, I'll draft a list to float here for discussion.
  6. Link text/statement? ...

    Comparison and sorting methods that rely upon IComparer or IComparer<T> must obey the properties of a total order, which define object equality relationships that make comparisons and sorting possible. Failure to do so can result in a failure to compare or sort the objects, an exception, or an infinite loop. For more information, see the [Total Order](LINK_TOTAL_ORDER_TOPIC) topic.

guardrex commented 7 years ago

@mairaw @BillWagner This one has run out beyond 7/31. Do u want to re-assign?

JonHanna commented 7 years ago

Do we want to link from methods relying on IComparer and IComparer?

I think so. There's sometimes a reasonable reason to not have total order in a comparer, after all, but a method that needs it, and doesn't get it, is where bad things can happen to good people. The Sort methods on lists and arrays, and Linq's OrderBy method and its cousins are all methods that need it. Linq's Min and Max overloads that take a comparer can also give confusing results without total ordering too (but won't go into an infinite loop as could potentially happen with some inputs to sorts).

carlossanlop commented 4 years ago

Tagging area owners @eiriktsarpalis @layomia to continue the discussion.