bishoksan / DFTA

Determinisation and Completion of Finite Tree Automata
2 stars 1 forks source link

[DeterminiserTextBook] A HashSet-based deltad improves performance significantly. #2

Closed modulovalue closed 9 months ago

modulovalue commented 10 months ago

It is possible to significantly improve the performance of the textbook-based determiniser implementation.

Consider:

https://github.com/bishoksan/DFTA/blob/2d4879581472d3364eb5c02d32daba238ce2d0ea/src/dfta/DeterminiserTextBook.java#L17

and

https://github.com/bishoksan/DFTA/blob/2d4879581472d3364eb5c02d32daba238ce2d0ea/src/dfta/DeterminiserTextBook.java#L127-L139

Notice how addTransition needs to do a linear scan on each new transition. We can avoid the linear scan by representing deltad as a hashset and checking whether the transition has already been added before.

// ...
    final LinkedHashSet<DTransition> deltad = new LinkedHashSet<>();
// ...
// ...
    boolean addTransition(FuncSymb f, LinkedHashSet<String> q0, ArrayList<LinkedHashSet<String>> lhs) {
        return deltad.add(new DTransition(f, q0, lhs));
    }
// ...

With this change, the textbook-based determiniser is able to determinise a lot more TAs than before.

However, this does not invalidate the results of the paper. The optimized version can still be observed to be orders of magnitude faster. Sadly, this optimization does not appear to be applicable to the optimized determiniser implementation.

jpgallagher commented 9 months ago

Thanks, you're right, I'll make this change. As you say, it does not affect the essential performance of the textbook algorithm, which is dominated by the number of transitions, rather than how long it takes to add each transition. But it is definitely an improvement.

As you say, it doesn't seem to apply to the optimised algorithm, where the data structure for storing transitions is quite different.

modulovalue commented 9 months ago

Thank you @jpgallagher, closed via https://github.com/bishoksan/DFTA/commit/4841004c5e71ddc2ddc69850f5d34a0e9b5b0e5d