Agents store Triggers by hashCode and drop them on hash collision

SebAlbert commented 7 years ago

When a trigger is added to an agent (and saved for execution in the next cycle), this is done in a Map with the trigger's hashCode as key. If there already is a trigger with the same hashCode, the second (and third, ...) trigger will not be saved and executed, but rather silently dropped. It seems reasonable to suppress duplicate triggers, but hashCode is not generally safe from collisions; two actually different trigger objects might return the same hashCode.

https://github.com/LightJason/AgentSpeak/blob/93971f0356be3f42ef467437b51fec1e3e3a1265/src/main/java/org/lightjason/agentspeak/agent/IBaseAgent.java#L353

Although actual occurrences of this being a problem will probably be rare, wouldn't it make more sense to use a HashSet or similar instead of a Map, or was this design chosen on purpose?

flashpixx commented 7 years ago

No that's wrong the map uses an own comparator which is defined on the trigger data https://github.com/LightJason/AgentSpeak/blob/93971f0356be3f42ef467437b51fec1e3e3a1265/src/main/java/org/lightjason/agentspeak/agent/IBaseAgent.java#L112-L113

See unit-test https://github.com/LightJason/AgentSpeak/blob/master/src/test/resources/agent/execution.asl https://github.com/LightJason/AgentSpeak/blob/master/src/test/resources/agent/language/trigger.asl

Trigger with equal data are joined to one execution frame, trigger with different data will be run in parallel, this behaviour is desired based on the fork-and-join-model

SebAlbert commented 7 years ago

I am not sure whether we are speaking of the same thing, or I don't understand what the m_plans Multimap has to do with it. When triggers are added to the agent for the next cycle, they are simply "putIfAbsent" with the hashCode of the trigger as key, into a simple HashMap https://github.com/LightJason/AgentSpeak/blob/93971f0356be3f42ef467437b51fec1e3e3a1265/src/main/java/org/lightjason/agentspeak/agent/IBaseAgent.java#L104

So if by bad luck the hashCode() of the Trigger objects representing a "!planA(123, 456)" and a "!planB(789)" are the same hash value, and both of these triggers are executed on the agent within the same cycle, then only one of them will make it into m_trigger and thus in the executionlist in the next cycle.

flashpixx commented 7 years ago

Please build a minimal example which shows the problem

SebAlbert commented 7 years ago

diff --git a/src/test/java/org/lightjason/agentspeak/agent/TestCAgentExecution.java b/src/test/java/org/lightjason/agentspeak/agent/TestCAgentExecution.java
index 1d74672a0..366d87e59 100644
--- a/src/test/java/org/lightjason/agentspeak/agent/TestCAgentExecution.java
+++ b/src/test/java/org/lightjason/agentspeak/agent/TestCAgentExecution.java
@@ -120,6 +120,8 @@ public final class TestCAgentExecution extends IBaseTest
         m_result.put( 1L, "twovalues equal type" );
         m_result.put( 1L, "twovalues different type" );
         m_result.put( 1L, "twovalues with literal" );
+        m_result.put( 1L, "hash collision on trigger Ea with FB" );
+        m_result.put( 1L, "hash collision on trigger FB with Ea" );
         m_result.put( 2L, "single" );
     }

diff --git a/src/test/resources/agent/execution.asl b/src/test/resources/agent/execution.asl
index fe46b73d9..ef9d182a6 100644
--- a/src/test/resources/agent/execution.asl
+++ b/src/test/resources/agent/execution.asl
@@ -40,6 +40,9 @@
     !doublecall;
     !doublecall;

+    !hashcollision/Ea;
+    !hashcollision/FB;
+
     !multiple( "first" );
     !multiple( "second" );

@@ -56,6 +59,12 @@
 +!doublecall <- log("single run").

 /**
+ * hash collision on trigger, both should run
+ **/
++!hashcollision/Ea <- log("hash collision on trigger Ea with FB").
++!hashcollision/FB <- log("hash collision on trigger FB with Ea").
+
+/**
  * called mutiple times
  */
 +!multiple(X) <- log(X); !single.

flashpixx commented 7 years ago

The problem is not the trigger, it is the path object in detail the string hash calculation, this call:

System.out.println( ( "hashcollision".hashCode() ^ "Ea".hashCode() ) + "   " +  ( "hashcollision".hashCode() ^ "FB".hashCode() ) );

returns equal hash values

SebAlbert commented 7 years ago

Yes, that's how I constructed an example to show you what I mean. The problem, though, is the way triggers are stored in the Agent. As I mentioned in the first place, maybe it should better be a HashSet instead of a Map with hashCode keys. HashSet doesn't allow real duplicates, but deals with hash collisions.

flashpixx commented 7 years ago

The hashcode function of the path class was modified to solve the problem. additional unit-test are added for this check

SebAlbert commented 7 years ago

I don't think this is the right way to go about this. Choosing a "random hash function" prevents me from deliberately "attacking" with a contrived hash collision, since now the hash function depends on system time millis. And this may be a desirable feature to prevent "DoS" attacks as described here. But still there will be - with low probability, yes, but not completely aviodable - hash collisions. This is the nature of a hash. And now they are even unpredictable, and will depend on the system time whether they occur between certain paths in triggers or not. I just need a much larger test case... with 2^32+1 different triggers, I can guarantee you the problem will persist in every run. With any number below this but greater than 1, collisions will occur "sometimes" and sometimes not (of course, the fewer different triggers, the lower the probability).

(I think I could still go through the literal's arguments in the trigger instead of the path, though, but you would probably work around this the same way.)

flashpixx commented 7 years ago

Hashing function is changed to SipHash-2-4 in cbe618b

SebAlbert commented 7 years ago

If I skimmed the code correctly, that's not depending on time/randomness any more, but the core of the problem persists as long as the triggers in the agent are stored in a map indexed by hash values with only one trigger per key (hash value) allowed. Maybe now I can (in theory) find a new collision to write another always-failing test with only two triggers. It might take me longer in practice, though, and there's no point in doing it.

flashpixx commented 7 years ago

Hash functions are not collision-free at anytime :-) but it is a practical and useful tool to speed-up the system, a lookup in a map is faster than in a list.

SebAlbert commented 7 years ago

Sure. But whenever one wants to keep the colliding elements (unless they are really equal), there usually is a list for every hash bucket, which essentially is the java.util.HashSet. This is the combination of hash-lookup-speed (nearly) O(1) and still staying correct even if there are collisions.

LightJason / Java-AgentSpeak

Agents store Triggers by hashCode and drop them on hash collision #44