daddyz / phonelib

Ruby gem for phone validation and formatting using google libphonenumber library data
MIT License
1.04k stars 130 forks source link

feat: improve performance when detecting country codes #274

Closed ElMassimo closed 1 year ago

ElMassimo commented 1 year ago

Description ๐Ÿ“–

This pull request adds a 10x speed up when the country for a phone number is unknown.

Ran tests locally, and they all pass.

Background ๐Ÿ“œ

Currently, when parsing international phone numbers this library allocates a significant amount of regexes (256 countries), and will unnecessarily match against all 256, although in practice it can only match a maximum of 1.

Country codes have 1, 2, or 3 digits, and have the interesting property that shorter codes are not prefixes of longer codes.

The global_phone library takes advantage of this to optimize country code detection.

By taking the first three prefixes of digits, it's possible to do a hash-based lookup instead of cycling through all countries.

The Fix ๐Ÿ”จ

Applying the techniques mentioned above to optimize detect_and_parse.

As a result, instead of creating 256 regexes and matching all of them every time a phone with an unknown country code was parsed, it will now perform only 3 hash lookups.

Benchmarks ๐Ÿ“Š

This optimization yields a 10x speed up when the country code is unknown! ๐Ÿš€

Added a new benchmark in spec/phonelib_ips_bench.rb, which can be run with rspec.

Before

Calculating -------------------------------------
       known country     27.029  (ยฑ 0.0%) i/s -    136.000  in   5.032140s
     unknown country      2.253  (ยฑ 0.0%) i/s -     12.000  in   5.325396s

Comparison:
       known country:       27.0 i/s
     unknown country:        2.3 i/s - 11.99x  slower

After

Calculating -------------------------------------
       known country     26.913  (ยฑ 0.0%) i/s -    136.000  in   5.053798s
     unknown country     23.172  (ยฑ 0.0%) i/s -    116.000  in   5.006049s

Comparison:
       known country:       26.9 i/s
     unknown country:       23.2 i/s - 1.16x  slower

Now the library will perform similarly when a country code is provided than when it needs to be detected.

If we combine this with the work in:

it should make both cases even faster, and make both cases comparable in performance (only 1.03x slower).

Memory Usage ๐Ÿ“Š

After this pull request, this use case allocates 5x less memory, so GC pressure will be mitigated as well.

Before

Calculating -------------------------------------
     unknown country    84.087M memsize (   160.465k retained)
                         1.859M objects (   513.000  retained)
                        50.000  strings (    50.000  retained)

After

Calculating -------------------------------------
     unknown country    15.234M memsize (   164.490k retained)
                       232.193k objects (   534.000  retained)
                        50.000  strings (    50.000  retained)
daddyz commented 1 year ago

@ElMassimo nice, thanks for PR