codelucas / cracking-the-da-vinci-code-with-google-interview-problems-and-nlp-in-python

A guide on how to crack combinatorics puzzles shown in The Da Vinci Code movie using CS fundamentals and NLP
http://codelucas.com/cracking-the-da-vinci-code/
MIT License
83 stars 16 forks source link

Possible speedup when using n-grams #1

Open VitamintK opened 7 years ago

VitamintK commented 7 years ago

Hey lucas! If you're going to use a list of frequent bigrams/trigrams anyways, you don't need to do the first part where you generate all sane permutations of the input string, right?

That is, if you have a list common_ngrams_dict, you could just do

for trigram in common_ngrams_dict:
    if sorted(trigram) == sorted(input_string):
        filtered_solution.append(trigram)

Which would avoid the O(n!) stuff in the first part. I still enjoyed the writeup, and I see how the first part still makes sense cause you'd still want to generate all sane permutations if you use the NLP approach and the anagram isn't a frequent n-gram. :+1:

codelucas commented 7 years ago

Good find @VitamintK! Your workaround is great and I should probably add it into the README.

However for these scenarios:

In practice, a good way to solve this puzzle would be (like you said) to run bi-gram/trigram/quadgram matching with the input string to first quickly remove those candidates. But! If that fails, we still need to fallback on the valid-word permutation generation with NLP analysis on the results.

💯 👍 🍷