facebook / duckling

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
Other
4.05k stars 723 forks source link

credit-card: only recognize format XXXX-XXXX-XXXX-XXXX #621

Closed clobotorre closed 3 years ago

clobotorre commented 3 years ago

We want to use credit-card recognicer in a speech-to-text environment. The usual way for our users to say a credit card number is digit by digit, which results in texts like "4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1" instead of "4111-1111-1111-1111". Even more, our speech-to-text engine sometimes return us the spoken form of some digits and sometimes grouped in a non fixed digits group length: "4 one 1 1 1 1 eleven 1 1 1 1 1 1 1 1".

This behaviour depends on how the user speaks.

Could it be possible to take into account in duckling these different ways to say a credit card number, especially when it is being used the credit-card-number dim in the request?

chessai commented 3 years ago

I suppose this is doable, but the non-fixed length makes it really hard to represent in duckling. Without that requirement it becomes easier. I suppose if you (on your end, not duckling's) performed a preprocessing step by splitting each numeral token into its digits (eg 'eleven' -> '1 1', 1342 -> '1 3 4 2'). Then, once that's been done, on the duckling side we could have rules for all fixed-length credit card numbers (there's a small finite set of them, so it's not hard).

That being said, I'm 50/50 on adding this because it seems difficult to support/not common use case. Perhaps this is a good use case for CustomDimension?

clobotorre commented 3 years ago

As long as we always use the dims parameter in the request, when a 'credit-card-number' dim is requested (on our side) I suppose we can parse the speech-to-text output to try to format it as a sequence of digits before call to duckling. We will work on it. Thanks