SerenitySoftware / cahoots

A Text Comprehension Engine in Python
MIT License
15 stars 3 forks source link

Rework phone number parsing to detect British numbers accurately #128

Closed hickeroar closed 9 years ago

hickeroar commented 9 years ago

"Your +44 example on the demo isn't actually a real UK phone number (it's missing a digit) and isn't formatted as a UK phone number, as we don't use separators other than spaces (e.g. 01334 840206 or +44 (0)1334 840206). That said, parsers for UK numbers are actually very complex, as we have different length area codes (0xx to 0xxxx). Dropping the leading 0, most numbers make up to 10 digits, but depending on the length of the area code, the local number will then often be space-separated in different ways (e.g. 020 xxxx xxxx or 0113 xxx xxxx or 01334 840 206, using the example above). If that wasn't complicated enough, after certain area codes, some local numbers are only 9 digits after the leading 0 instead of 10. The vast majority of parsers out there will fall over on that edge case, much to the annoyance of the people with those phone numbers, I'm sure!"

hickeroar commented 9 years ago

It looks like the number was "valid enough" to be detected as a number even though it might have had some "impurities." I've replaced the example number with a technically valid one (google's hamburg germany branch) so I'm just going to close this.