Closed freezing closed 8 years ago
Spell checker should use words from the product titles from both train and test data, as well as attributes and descriptions - we are assuming that these are well written.
def smallErrors(w: String): List[String] = { smallErrors(w, 1) union smallErrors(w, 2) } // TODO: Add memoization def smallErrors(w: String, d: Int): List[String] = { if (d > w.length) throw new IllegalArgumentException("Nu") d match { case 0 => List(w) case 1 => smallErrors1(w) case k => smallErrors1(w) flatMap { s => smallErrors(s, d - 1) } } } def smallErrors1(w: String): List[String] = { (0 until w.length map { idx => w.substring(0, idx) + w.substring(idx + 1) }).toList }
Create a job that will create spell checker dictionary, each entry is:
Maybe remove only 1 letter for words shorter than 5, and don't spell correct words with length <= 2?