For each subset, the substitute function also tests its complement (and tests the complement first, since it's bigger in all cases except when gran=len(superset) ).
I also changed _substitute to use sets instead of lists, so that set subtraction would be faster. This does mean that the order of nodes in the superset is not necessarily preserved.
As far as I've tested, runtimes are similar and output files are always smaller. The number of tests performed is higher, but not twice as high -- which suggests that some of the larger complements are passing tests and getting removed early on. This is probably not the ideal substitution process, but since it seems to be almost strictly better, i thought I'd submit it anyway.
For each subset, the substitute function also tests its complement (and tests the complement first, since it's bigger in all cases except when gran=len(superset) ). I also changed _substitute to use sets instead of lists, so that set subtraction would be faster. This does mean that the order of nodes in the superset is not necessarily preserved.
As far as I've tested, runtimes are similar and output files are always smaller. The number of tests performed is higher, but not twice as high -- which suggests that some of the larger complements are passing tests and getting removed early on. This is probably not the ideal substitution process, but since it seems to be almost strictly better, i thought I'd submit it anyway.