New dictionary data broke the tests' expected results values

I think fixing this issue is basically the required next step for the project. Trivially, it's necessary to be able to run the tests and the benchmarks, which we (probably) need before making any other changes.

But more fundamentally, we need to make sure that the results it's producing are actually useful. Without comparing the results of this algorithm to a different one, we can't be sure that what we're making works, and thus whether any improvements are actually helping.

The question then becomes: what alternative algorithm should we use in this contemplated new tool. Simple is obviously desirable, but the simplest approach of letter-shuffling-then-grepping suffers from combinatorial explosion.

A variation might be more useful, and might actually provide a good point of comparison: take the dictionary file, load its words into a trie (no letter sorting, just a normal trie), and then let the naive letter shuffling be constrained by the trie.

In other words, if we're generating anagrams of fmsltez, for each shuffling of the letters, traverse the trie for each letter:

(root) -> f exists. Continue.
(root) -> f -> m doesn't exist. Therefore all potential shufflings starting with fm are invalid. Shuffle letters after f. (For argument, say we shuffle to festzlm.)
(root) -> f -> e exists. Continue.
(root) -> f -> e -> s exists. Continue
(root) -> f -> e -> s -> t exists and has the word 'fest' attached. Add to results. Continue.
(root) -> f -> e -> s -> t -> z doesn't exist. Shuffle letters after fest.

At any point where a word is attached to a node, record it as a result. Descend recursively.

If my thinking is correct, this method should allow the naive shuffling method to produce all valid results for a search string, given enough time, while vastly cutting down on its search space. It should also be much, much less efficient than the anagrambler algorithm.

My hope in sketching this out is that it's a simple enough algorithm that we can rely on its correctness (and possibly prove it correct in some meaningful way?) while not having to wait until the sun to bloom and shrivel for it to finish execution. (For small enough n; this still wouldn't be able to take on multi-hundred-character search strings, I think.)

Once this proposed tool exists, we can compare the results with anagrambler. If they agree, and we can be reasonably certain the simpler algorithm is correct, then we can just update the test data expected results.

A more correct fix would be to change the testing to compare the actual produced results to an expected list, and I think that's the way to go. Slower, but better.

We should also try to improve the tests by finding edge case data and other possible places where things could break down; I think the current data is too simplistic. But that's another issue.

RyanEdwardHall / anagrambler

New dictionary data broke the tests' expected results values #18