aosingh / lexpy

Python package for lexicon; Trie and DAWG implementation.
GNU General Public License v3.0
55 stars 7 forks source link

Incorrect order of answers when using the wildcard '*' in DAWG #11

Open lifmore opened 2 years ago

lifmore commented 2 years ago

Hi,

I wonder if there is a small issue in the file automata.py, function __words_with_wildcard, between lines 128 and 147, when the case letter=='*' is processed.

If the dictionary is made of, for example, "CHIAC" and "CHIC", and the query is "CHI*C", the result will be return in an incorrect alphabetical order : "CHIC" then "CHIAC".

This is because the case words_at_current_level is processed before checking the children.

So, for "CHI*C",

Any idea? Or maybe did I misunderstood the code?

Best, Lionel

lifmore commented 2 years ago

I was able to fix it by modifying the __words_with_wildcard function. The key idea is to process the words_at_current_level case in the middle of the "for child in node.children:" loop, at the right timing, and not before or after the loop.

aosingh commented 2 years ago

Thank you @lifmore

I just saw this. I will try to reproduce the issue and get back to you.