cs3110 / textbook

The CS 3110 Textbook, "OCaml Programming: Correct + Efficient + Beautiful"
Other
738 stars 134 forks source link

Issue on page /chapters/modules/functors.html #186

Open curtmack opened 3 months ago

curtmack commented 3 months ago

From the Note under 5.9.3:

Alas, historically many languages have used comparison functions with similar specifications, such as the C standard library’s strcmp function. When comparing two integers, it does make the comparison easy: just perform a subtraction. It’s not necessarily so easy for other data types.

(Emphasis added.)

Although the chapter does not necessarily recommend this practice, I feel it should not be included at all. This practice is very common, but it's wrong. Integer subtraction can overflow, and if it does, the result of the comparison function will be incorrect. This can be a serious problem if the comparison function is used to organize a data structure.

Consider this example:

let mycompare = ( - );;

mycompare min_int 10;;
mycompare 10 (-10);;
mycompare (-10) min_int;;

Output:

val mycompare : int -> int -> int = <fun>
- : int = 4611686018427387894
- : int = 20
- : int = 4611686018427387894

Note that all three results are positive. That is, according to mycompare, all three of these statements are true: min_int > 10; 10 > -10; and -10 > min_int. This violates the transitive property, which states that if a > b and b > c, then a > c. Many data structure algorithms rely on this property to produce correct results, so a comparison function that violates it could produce a malformed data structure.

Indeed, we can see this if we use it to construct a Map module:

module BadOrderedType = struct
  type t = int
  let compare = ( - )
end;;

module BadM = Map.Make(BadOrderedType);;

There's some inconsistency in how my version of utop constructs BadM, but I can pretty consistently get BadM.of_list to infinite loop, e.g. with:

BadM.of_list [(min_int, "min"); (-10, "M10"); (10, "10")]

So, in conclusion: ( - ) should not be used as a comparison function for integers, as it can cause issues when it overflows. Stdlib.compare and Int.compare do not have this problem, and should always be preferred.