performance: "one-of turtles" is O(n), should be O(1)-ish

SethTisue commented 12 years ago

both TreeAgentSet and ArrayAgentSet are affected

I made a benchmark to measure it, models/test/other/One-Of Turtles Benchmark.nlogo

TreeAgentSet and ArrayAgentSet are different problems.

in the TreeAgentSet case, we know the sets contain no dead agents. we could go straight to a random agent in O(log n) time if we were using a tree that knew the size of it's subtrees. e.g. starting at root if we have 10 agents on the left and 7 on the right, we have a 1 in 18 chance to stop, a 10 in 18 to go left, and a 7 in 18 to go right. but java's TreeMap.Entry doesn't store this information.

in the ArrayAgentSet case, the array may contain dead agents. we could pick randomly and if we hit a dead agent pick again, but then the worst-case runtime is unbounded. can we be cleverer? we can't pick a random location and scan forward for the next live turtle, because that would favor live turtles that follow clusters of dead ones.

mrerrormessage commented 9 years ago

I think we can be cleverer on ArrayAgentSet, if it is acceptable to mutate the array. The general strategy would be something like what follows.

def selectRandom(a: Array[Agent], n: Int = 0): Option[Agent] = {
  if (a.isEmpty || n >= a.length)
    None
  else {
    val i = Random.nextInt(a.length - n) + n
    val agent = a(i)
    if (agent.isAlive)
      Some(Agent)
    else {
      a(i) = a(n)
      a(n) = agent
      selectRandom(a, n + 1)
    }
  }
}

As I understand the time-complexity of get/update, this would be O(1) in the best case and only O(n) in the worst-case, which is what we have now anyway.

elfprince13 commented 8 years ago

If it were possible for AgentSets to subscribe to "death events" for contained agents, this would be a lot easier to do efficiently in both cases.

qiemem commented 8 years ago

You can actually do something quite similar to @mrerrormessage's solution without modifying the array. Instead, you can use a map to track which elements have been eliminated and redirect to those still in the candidate pool. As a bonus, this can operate on any Seq-like data structure, not just arrays. Note that these methods also generalize to n-of to make that O(sample size)-ish. Here's a generalized solution I came up with:

def sample[T: Manifest](s: Seq[T], sampleSize: Int, pred: T => Boolean = (x: T) => true): Seq[T] = {
    val m = new mutable.OpenHashMap[Int, T](s.size)
    val res = new Array[T](sampleSize)
    var insertIndex = 0
    var startIndex = 0
    while (insertIndex < sampleSize) {
        val i = Random.nextInt(s.size - startIndex) + startIndex
        val x: T = m.put(i, s(startIndex)).getOrElse(s(i))
        if (pred(x)) {
            res(insertIndex) = x: T
            insertIndex += 1
        }
        startIndex+=1
    }
    res
}

However, note that the constant factor on the hashmap lookup does hurt the performance of this. I compared it to @mrerrormessage's solution, but cloning the array at the beginning. Under 500 elements, the array version still beats the map version, especially if a decent percent of the agents are dead. Above that, the map version wins, unless the sample size is largish or there are a lot of dead agents. I tried various types of maps (OpenHashMap performed by far the best) and with size hints (though they didn't help).

I also tried doing this with immutable Vectors, since their pretty perfect for this situation (constant time read and write, persistent), but the map version beat it by a lot.

I tested a couple other solutions too, but nothing worth mentioning. Main takeaway: hot damn, cloning arrays is fast.

A hybrid solution that clones the array when the agentset is small and uses a map when big could be nice, but is perhaps is unnecessarily complicated.

NetLogo / NetLogo

performance: "one-of turtles" is O(n), should be O(1)-ish #10