aditya-grover / node2vec

http://snap.stanford.edu/node2vec/
MIT License
2.6k stars 912 forks source link

[node2vec_spark]Could somebody explain what this code is doing? #101

Open Wang-Yu-Qing opened 3 years ago

Wang-Yu-Qing commented 3 years ago

In GraphOps.scala:

  def setupAlias(neighbours: Array[(Long, Double)]): (Array[Int], Array[Double]) = {
    val K = neighbours.length
    val J = Array.fill(K)(0)
    val q = Array.fill(K)(0.0)

    val smaller = new ArrayBuffer[Int]()
    val larger = new ArrayBuffer[Int]()

    val sum = neighbours.map(_._2).sum
    neighbours.zipWithIndex.foreach { case ((nodeId, weight), i) =>
      q(i) = K * weight / sum
      if (q(i) < 1.0) {
        smaller.append(i)
      } else {
        larger.append(i)
      }
    }

    while (smaller.nonEmpty && larger.nonEmpty) {
      val small = smaller.remove(smaller.length - 1)
      val large = larger.remove(larger.length - 1)

      J(small) = large
      q(large) = q(large) + q(small) - 1.0
      if (q(large) < 1.0) smaller.append(large)
      else larger.append(large)
    }

    (J, q)
  }

What does this function do? I know that the J, q is used for transition probability, but what is J and q exactly and how they are related with transition probability?

Wang-Yu-Qing commented 3 years ago

Let me answer my own question. This is a sample method called "Alias Method", which has a O(1) time complexity for sample and O(n) time complexity for construct the alias table (a data structure needed for this sample method). More details: https://www.keithschwarz.com/darts-dice-coins/