The lexer generator showed a massive slowdown around determinization, when C keywords were recognized with it. This is because - as it turns out - nearby states in the StateSets unified, meaning they were hashed with XOR, yielding values like 1, 2, 3 very frequently. The slowdown was thereby caused by the hash-set search degraded to a linear search.
This PR fixes this by assigning random positive integer values as the state ID, when generating the NFA instead of sequential numbers.
The lexer generator showed a massive slowdown around determinization, when C keywords were recognized with it. This is because - as it turns out - nearby states in the
StateSet
s unified, meaning they were hashed with XOR, yielding values like1
,2
,3
very frequently. The slowdown was thereby caused by the hash-set search degraded to a linear search.This PR fixes this by assigning random positive integer values as the state ID, when generating the NFA instead of sequential numbers.
Closes #114