ChalmersGU-data-structure-courses / OpenDSA

Working source for the OpenDSA eTextbook project - Chalmers/GU version
https://chalmersgu-data-structure-courses.github.io/OpenDSA/
Other
0 stars 2 forks source link

Tree terminology #4

Open sattlerc opened 4 months ago

sattlerc commented 4 months ago

Here is a discussion about tree terminology. I'd love to hear your thoughts.

When I teach search trees, I start by discussing trees. In that context, that means rooted trees, sometimes also called "arborescences" when the edges are oriented away from the root. I typically use the following terminology:

Then we come to binary trees. Are these trees? Yes, with two different kinds of nodes:

However, very often, a conflicting terminology is used. If we ask Wikipedia what a binary tree is, it says: it is a tree data structure where every node has at most two children. So the nullary nodes from before are not counted as nodes, only the binary nodes, but they are treated as at-most-binary. Note that it would be wrong to say that a binary tree is a tree where every node has no children, exactly one child, or two (ordered) children; in case of exactly one child, one still has to make the difference between whether it is a left or right child. A more precise version would be: every node has an optional left child and an optional right child; additionally, the root node is optional.

Note that this terminology loses the connection to rooted trees. A rooted tree always has a root node, but in that terminology the root node is optional.

The reason for the second terminology may be that in binary search trees, data is only stored in binary nodes. The size of a binary search tree, as modelling a set or map, is the number of binary nodes. (But note that the same is not true for 2-3-trees; there, we have to count the binary nodes once and the ternary nodes twice.)

The biggest confusion comes with the term leaf. In the second convention, this does not refer to the empty nodes, but rather to binary nodes whose left and right child are empty (that is, nullary nodes). See here for the disconnect between binary trees and general trees in the course book.

Some thoughts:

Some suggestions:

heatherleaf commented 4 months ago

I think we should teach both, unfortunately.

Your suggestion is the logically cleanest. But then students will be confused when they read Wikipedia and everything else about BSTs in books and on the web. But at the same time, your suggestion maps very well with R/B trees, so I think there's a point in teaching that too.

I suggest to not call null nodes "leaves". That will just make students really really confused. Instead just call them "null nodes" (or "nullary"). Use "leaves" for binary nodes with only null nodes as children. The word "children" will be ambiguous - sometimes it will include null nodes, but most of the times it will mean non-null children.

PS. Why do you use "nullary nodes" instead of "null nodes"?

heatherleaf commented 4 months ago

Regarding Haskell: I would use the constructor Leaf only if it has a value, otherwise I would call it None or Nil or something like that. But apparently people disagree:

sattlerc commented 4 months ago

Your suggestion is the logically cleanest. But then students will be confused when they read Wikipedia and everything else about BSTs in books and on the web.

Both conflicting terminologies already appear in the literature (for example, Knuth uses leaf in the original sense). Wikipedia is inconsistent (see Figure 1 for binary search tree).

I suspect the use of leaf for certain binary nodes arose from an abuse of notation: not drawing the nullary nodes (true leaf nodes) in pictures of binary trees. Now the binary nodes with only nullary children look like leafs. (There are sources that always draw the nullary nodes, using a different symbol.)

Whatever choice we make perpetuates the chosen convention, so we have some responsibility. And clearly, a warning is warranted.

But at the same time, your suggestion maps very well with R/B trees

I don't understand what you mean here. Red-black trees are binary search trees, so the terminology should be the same.

PS. Why do you use "nullary nodes" instead of "null nodes"?

Because some types of binary trees store data at the nullary nodes, for example Huffman trees. (Also, null evoces the connotation null pointer in Java, which is a language artifact).

In general, the arity of a node is how many children it has (nullary, binary, ternary).

heatherleaf commented 4 months ago

But at the same time, your suggestion maps very well with R/B trees

I don't understand what you mean here. Red-black trees are binary search trees, so the terminology should be the same.

I meant that we explicitly have to color the nullary nodes black. But for other BSTs we don't need to think about null nodes (we can e.g. say that a node doesn't have a left child)

heatherleaf commented 4 months ago

Actually, that's something I would like to be able to say - that a BST node doesn't have a left/right child. Because that's what we use when deleting.