dafny-lang / libraries

Libraries useful for Dafny programs
Other
43 stars 25 forks source link

Request: lexicographical orderings #52

Open robin-aws opened 1 year ago

robin-aws commented 1 year ago

Spun off from #49. Once that is merged we'll have the definition of [strict] total orderings, and can sensibly define this ordering for seq<T> values based on an element ordering on T values.

This is particularly valuable because Dafny defines < on string values to be "proper prefix of" to make termination proofs easier, but many many users will also want the more common definition as well.

We also need to be careful about implying that using this ordering on Dafny string values will align with the actual intentions, since char values are currently UTF-16 code units and therefore a naive lexicographic comparison will likely not be correct for most uses. This will still be true even after https://github.com/dafny-lang/rfcs/pull/13 is implemented, since individual Unicode scalar values don't always correspond to human-perceived characters either. We will likely want separate stdlib utilities for Unicode-compliant collation. See https://unicode.org/reports/tr10/ for the gory details.

kjx commented 1 year ago

I also got caught out by this; thinking "<" on strings meant the same as it does in pretty much *every other programming language in the known universe". Would be good to have T(<) to mean Comparable long with T(==) Equality bounds on generic parameters, and potentially a notion (where clauses per attribute) that means a sequence of comparable elements is itself comparable (or whatever the haskell terminology is).

probably have to pick some other operators for "isPrefixOf" though... <: and <=: perhaps. etc

robin-aws commented 1 year ago

Also relevant: https://github.com/dafny-lang/libraries/issues/53. If we're going to introduce tri-valued comparison functions, we may want to do that first before adding more ordering-related utilities.

ajewellamz commented 1 year ago

I want to disagree with "since char values are currently UTF-16 code units and therefore a naive lexicographic comparison will likely not be correct for most uses". The most important thing is to be able to have any TotalOrdering on a string; much less important that it be human friendly.