making thread order deterministic (using OrderedDict instead of dict)
merging root threads by subject is now optional (step 5 of JWZ algorithm)
ability to sort threads by subject, message id, or date
Adding a mechanism to validate threads using mailman mailing lists as ground truth data. In particular, this PR includes the data from January 2010 for fedora-devel mailing list (292 emails). The expected threading is reproduced by this implementation of the JWZ algorithm, with the exceptions of the following points,
The 3 last emails of the "Common Lisp apps in Fedora," thread are put in a separate thread. Mailman marks those messages as <Possible follow-up> meaning that there might be some uncertainty. This needs further investigation.
When several emails refer to a parent that doesn't exist (e.g. email was deleted, or send before January 1st here) JWZ creates an empty container for the root node. Mailman appears to move the first message to the root node and reconstruct the rest of the thread accordingly (cf. "Re: ABRT considered painful" thread). It is unclear which approach is best.
The maximum thread depth in mailman is 3 (i.e. all messages deeper than 3 are collapsed to level 3), and is unlimited in JWZ. Jwzthreading needs a mechanism to collapse threads to a given maximum level.
This PR adds the following functionality,
OrderedDict
instead ofdict
)fedora-devel
mailing list (292 emails). The expected threading is reproduced by this implementation of the JWZ algorithm, with the exceptions of the following points,<Possible follow-up>
meaning that there might be some uncertainty. This needs further investigation.