MycroftAI / padatious

A neural network intent parser
http://padatious.readthedocs.io
Apache License 2.0
159 stars 40 forks source link

Intent does not handle apostrophes properly #26

Closed stratus-ss closed 4 years ago

stratus-ss commented 4 years ago

Give the following intent

add {Food} to (| (the | my)) {ShoppingList} (| list) (under {Category} |)

Shopping list is improperly parsed Using the phrase: add temperature sensors to steve's projecta

The intent parser produces steve ' s projecta instead of steve's projecta

The utterance shows correct parsing:

~~~~50788 | __main__:handle_utterance:72 | Utterance: ["add temperature sensors to steve's projects"]

however the message.data shows that shoppinglist has been poorly parsed

 {'food': 'temperature sensors', 'shoppinglist': "steve ' s projects", 'utterance': "add temperature sensors to steve's projects"}

This obviously causes a errors or unmatched entities

The error comes from match_data.py. This statement:

def detokenize(self):
        self.sent = ' '.join(self.sent)

combine with the fact that self.sent is split like so:

 'sent': ['add', 'something', 'to', 'steve', "'", 's', 'projects'], 'matches': {}, 'conf': 0.0}

Causes the error. One solution that could be refined is

    @staticmethod
    def handle_apostrophes(old_sentence):
        new_sentence = ''
        apostrophe_present = False
        sentence = "steve's projects"

        for word in old_sentence:
          if word == "'":
              apostrophe_present = True
              new_sentence += word
          else:
            if apostrophe_present:
                new_sentence += word
                apostrophe_present = False
            else:
                if len(new_sentence) > 0:
                    new_sentence += " " + word
                else:
                    new_sentence = word
            apostrophe_presnet = False
        return new_sentence  

    # Converts parameters from lists of tokens to one combined string
    def detokenize(self):
        self.sent = self.handle_apostrophes(self.sent)

        new_matches = {}
        for token, sent in self.matches.items():
            print(self.handle_apostrophes(sent))
            new_token = token.replace('{', '').replace('}', '')
            new_matches[new_token] = self.handle_apostrophes(sent)
        self.matches = new_matches
forslund commented 4 years ago

Resolved by #27, closing