ARG equals/hashCode fix

leventeBajczi commented 1 year ago

This PR fixes the problems with the equals and hashcode implementations of ARG, ArgTrace, ArgNode and ArgEdge.

See issue #219 for reference.

Do not merge until benchmarks succeed.

leventeBajczi commented 1 year ago

I've had a realization while implementing these changes: If we use an external utility class to implement the structural hashCode/equals methods, we won't be able to cache the hashcodes, so we'll always traverse the graph (albeit only towards the root, which is still a lot better than before). Does this bother us enough that we move the hash code calculation into the classes themselves (where we can dedicate a private field to their cached values)? Or should we try and cache it in a map somewhere? What about the equals() call -- is it inevitable to traverse all ancestors for the equality check?

szdan97 commented 1 year ago

I've had a realization while implementing these changes: If we use an external utility class to implement the structural hashCode/equals methods, we won't be able to cache the hashcodes, so we'll always traverse the graph (albeit only towards the root, which is still a lot better than before). Does this bother us enough that we move the hash code calculation into the classes themselves (where we can dedicate a private field to their cached values)? Or should we try and cache it in a map somewhere?

Although not as fast as caching inside the node itself, caching in a map in the structural equality class itself would be the most flexible solution. I'd start with that, and if the benchmarks show that it has too much overhead compared to direct caching, then we can move it back into the node.

What about the equals() call -- is it inevitable to traverse all ancestors for the equality check?

If we move the state equality check before the in-edge equality check in the node equals, and the action equality before the source equality in the edge equals, then we always start with the cheaper checks and end with the recursive checks. Thereby we could stop at the earliest possible point when two nodes are not equivalent.

As for the case when two nodes are equivalent, the only optimization I can currently think of is to start with checking if the nodes are reference-equivalent, return true if they are, and check structural equivalence only after that. This way comparing nodes in the same ARG will stop when they reach a common ancestor (of course two nodes that exist at the same time in the same ARG will not be equivalent anyway, but notice that comparing a later and an earlier CEX compares in the same ARG as well; I know that the current CEX check uses only hashcodes, but if we change it later this might help).

As for distinct ARGs, we unfortunately always have to check the whole path from the root with this semantics when the nodes are equivalent. When comparing two whole ARGs, not nodes, we could have some kind of optimization so that we do not check the common prefixes of paths multiple times, but that seems to be more complicated than what we want to do as long as it is only used in a test. We could cache which non-reference-equivalent node pairs have already been compared and found structurally equivalent, but that might have a high memory cost.

csanadtelbisz commented 1 year ago

As for the case when two nodes are equivalent, the only optimization I can currently think of is to start with checking if the nodes are reference-equivalent, return true if they are, and check structural equivalence only after that. This way comparing nodes in the same ARG will stop when they reach a common ancestor (of course two nodes that exist at the same time in the same ARG will not be equivalent anyway, but notice that comparing a later and an earlier CEX compares in the same ARG as well; I know that the current CEX check uses only hashcodes, but if we change it later this might help).

It is best practice anyway, to start equals methods with reference check (and null & class type check; the latter is irrelevant in this case when the type of the input parameters are exactly specified), so it could be added to all equals methods, not only the ArgNode equals. The current version can produce null pointer exception: see n1.inEdge and similar use-cases (this is java, not kotlin).

leventeBajczi commented 1 year ago

Changes since @szdan97's review:

Reverted deletion of caching
Moved state and action checks before ancestor checks
Added null checks and self-reference checks
Deleted old HASH_SEED and hashcode fields

I'll wait for @AdamZsofi to finish debugging the cex-monitor faults. In the meantime, if any further improvements come to mind, please do comment on them here.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
8 Code Smells

68.5% Coverage
0.0% Duplication

ftsrg / theta

ARG equals/hashCode fix #238