freezing / kaggle-product-search

Estimation of product relevance for the given search query.
0 stars 0 forks source link

Spell Checker #7

Closed freezing closed 8 years ago

freezing commented 8 years ago

Create a job that will create spell checker dictionary, each entry is:

Maybe remove only 1 letter for words shorter than 5, and don't spell correct words with length <= 2?

freezing commented 8 years ago

Spell checker should use words from the product titles from both train and test data, as well as attributes and descriptions - we are assuming that these are well written.

freezing commented 8 years ago

http://blog.faroo.com/2012/06/07/improved-edit-distance-based-spelling-correction/

freezing commented 8 years ago

https://www.quora.com/What-are-some-algorithms-of-spelling-correction-that-were-used-by-search-engine

freezing commented 8 years ago

http://norvig.com/spell-correct.html

freezing commented 8 years ago
def smallErrors(w: String): List[String] = {
    smallErrors(w, 1) union smallErrors(w, 2)
  }
  // TODO: Add memoization
  def smallErrors(w: String, d: Int): List[String] = {
    if (d > w.length) throw new IllegalArgumentException("Nu")
    d match {
      case 0 => List(w)
      case 1 => smallErrors1(w)
      case k => smallErrors1(w) flatMap { s => smallErrors(s, d - 1) }
    }
  }
  def smallErrors1(w: String): List[String] = {
    (0 until w.length map { idx => w.substring(0, idx) + w.substring(idx + 1) }).toList
  }