Catching first number with regex fails

whtecmp commented 4 years ago

I tried to create a number.rx file with the contents of: .* (?P<number>[0-9]+) .*, hoping that this will result in me being able to use the number parameter in my python, but for some reason it doesn't match against my regex ever.

clusterfudge commented 4 years ago

Hey @whtecmp thanks for filing an issue! Just for clarity, you're using adapt via mycroft-core, correct? I'm happy to dig in, but it would be helpful to have more context on the queries you're issuing (the utterances being sent to adapt), as well as the full vocab contents of the parser.

Best would be a PR with a failing unit test, and I can dig into what's happening. I'll check briefly on the regex above, but if I'm unable to reproduce I won't be able to get much farther without the above.

Thanks!

clusterfudge commented 4 years ago

I was able to confirm that the tagger is finding digits. Can you provide some more detail on the structure of your Intents, and the rest of the loaded vocabulary?

In [2]: from adapt.engine import IntentDeterminationEngine                                                                                                                                    

In [3]: engine = IntentDeterminationEngine                                                                                                                                                    

In [4]: engine = IntentDeterminationEngine()                                                                                                                                                  

In [5]: engine                                                                                                                                                                                
Out[5]: <adapt.engine.IntentDeterminationEngine at 0x7feadc56bac0>

In [6]: engine.register_regex_entity(".* (?P<number>[0-9]+) .*")                                                                                                                              

In [7]: engine.tagger.regex_entities                                                                                                                                                          
Out[7]: [re.compile(r'.* (?P<number>[0-9]+) .*', re.IGNORECASE|re.UNICODE)]

In [8]: engine.tagger.tag("what is the 1 way to kill a vampire?" 
   ...: )                                                                                                                                                                                     
Out[8]: 
[{'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False},
 {'match': '1',
  'key': '1',
  'start_token': 3,
  'entities': [{'key': '1',
    'match': '1',
    'data': [('1', 'number')],
    'confidence': 0.5}],
  'end_token': 3,
  'from_context': False}]

whtecmp commented 4 years ago

This issue came up when I was playing around with some examples, so I don't have an "actual use case", but what I tried is this:

number.rx: .* (?P<number>[0-9]+) .*

test.voc: test

__init__.py:

    @intent_handler(IntentBuilder('Intent').require('test').require('number'))
    def handle_test(self, message):
        self.speak('Got here!')

And the utterance I'm trying is: "test 4 test". But I never get to the "Got here" I'm using the cli, and I tried both "4" and "four" as the number.

clusterfudge commented 4 years ago

Hrmm..... interesting. I think it might be worthwhile to hop into the mycroft chat and have someone help debug the skill with you. It's a little bit of magic to dig the adapt internals out of a mycroft skill, but they should be able to help you. I apologize, I'm not as familiar with the debugging internals there anymore.

clusterfudge commented 3 years ago

@whtecmp It's possible this is another issue stemming from #136. I'm hoping that merges shortly and mycroft-core will pull it in shortly after that.

clusterfudge commented 3 years ago

@whtecmp if you have a chance to try out #136 , please let me know if this addresses your issue. I'm also going to be close out stale issues on 9/17, this one included. Thanks!

MycroftAI / adapt

Catching first number with regex fails #108