bcherny / learner

learn an alphabet given a training set
0 stars 1 forks source link

Consider building graph of Char->Char and doing a topological sort on it... #1

Open pathikrit opened 8 years ago

pathikrit commented 8 years ago

Here is my Scala impl.:

class Learner[A](root: A) {
  import scala.collection.mutable.{Map => Dict}
  private val next = Dict.empty[A, Set[A]] withDefaultValue Set.empty[A]

  def learn(words: Seq[A]*) = for {
    i <- words.indices
    _ = words(i).foreach(u => next(root) += u)
    j <- words.indices drop (i+1)
    (u, v) <- words(i) zip words(j) find {case (a, b) => a != b}
  } next(u) += v

  private def dfs(visited: Set[A], depth: Dict[A, Int])(u: A): Dict[A, Int] = {
    require(!visited(u), s"Cycle detected involving $u")
    next(u).filterNot(visited).foreach(dfs(visited + u, depth))
    depth(u) = depth(u) max visited.size
    depth
  }

  def inferOrdering(): Seq[(A, Int)] = dfs(Set.empty, Dict.empty[A, Int] withDefaultValue 0)(root).toSeq
}

object AlphabetLearner extends App {
  val learner = new Learner('*')
  learner.learn("alpha", "baby", "beta", "brownies", "cat", "dad", "dog", "elephant", "ralph", "orange")
  val ordering = learner.inferOrdering()
  ordering sortBy {case (_, depth) => depth} foreach println
}
bcherny commented 8 years ago

i'm not sure this produces the correct output.

also, how do you know when you don't have enough information to infer an ordering? eg. if your set was just ("alpha", "baby", "beta", "brownies", "cat", "dad", "dog"), you don't know where "e" and "o" should go in the list.

pathikrit commented 8 years ago

I add a magic '*' as the root of all characters (i.e. it has edge to every character but no character has edge to it) to make the directed-graph not disjoint. I then do DFS from root and assign each node the maximum depth at which I saw that node while DFS-ing. If there are multiple nodes at the same maxDepth, I have not inferred ordering b/w those 2 nodes. If I hit a visited node during DFSing, there is a paradox in the input i.e. a cycle. If each node has a unique maxDepth, ordering is fully inferred. For example, for input = ("alpha", "baby", "beta", "brownies", "cat", "dad", "dog"), this is my output:

(w,1)
(n,1)
(h,1)
(t,1)
(s,1)
(p,1)
(y,1)
(g,1)
(a,1)
(i,1)
(l,1)
(e,2)
(b,2)
(o,2)
(r,3)
(c,3)
(d,4)

I have not fully inferred ordering b/w all elements with same maxDepth e.g. I don't know here the order between e, b and o (although I do know that they are before all the ones with maxDepth<2 and before all the ones with maxDepth>2).

bcherny commented 8 years ago

:+1: