OmkarPathak / ResumeParser

A simple resume parser used for extracting information from resumes
MIT License
288 stars 164 forks source link

Fix Regex to Extract phone number #3

Closed ptalmeida closed 5 years ago

ptalmeida commented 5 years ago

I tested your regex expression to extract phone numbers and I found that it is very restrictive and somewhat inconsistent, per example:

+911234567890 matches to +9112345678

+929 929929929 matches to +929 9299299

I came up with this expression (?:(?:\()?\+([0-9]*)(?:\))?((?:[\s-]*[0-9]+)+))

which matches numbers in two groups, per example:

(+91) 1234567890 matches to (+91) 1234567890 with group 1 as 91 and group 2 as 1234567890 1234567890

if there are no '(' or ')' it'll match the whole number to one group.

What do you think of this, should I make a pull request on this?

ptalmeida commented 5 years ago

Also it seems like the project needs a general cleanup, regarding old comments and hardcoded src's. Let me know if you're up for some help!

OmkarPathak commented 5 years ago

The above regex gives an invalid result for case: + 1 (415) 582-7457. It outputs only +1

OmkarPathak commented 5 years ago

I am on my way to restructure the code. I am creating a CLI and will be releasing it soon :smile:

ptalmeida commented 5 years ago

Is + 1 (415) 582-7457 a valid phone number syntax?

OmkarPathak commented 5 years ago

Yes

OmkarPathak commented 5 years ago

There are various syntax for phone numbers, and hence it is very difficult to make a generic regex. I am from India and I have made my regex to match our Indian phone numbers. I am making the regex configuration dynamic so that users can easily add the regex according to their country

ptalmeida commented 5 years ago

A more robust one: (?:(?:(?:(\+)((?:[\s.,-]*[0-9]*)*)(?:\()?\s?((?:[\s.,-]*[0-9]*)+)(?:\))?)|(?:(?:\()?(\+)\s?((?:[\s.,-]*[0-9]*)+)(?:\))?))((?:[\s.,-]*[0-9]+)+)) this one doesn't deal with group separation like the previous one.

Yes I can see how that can easily become a hard task.