alexreisner / geocoder

Complete Ruby geocoding solution.
http://www.rubygeocoder.com
MIT License
6.34k stars 1.19k forks source link

[MaxMind Local] Speedup database IP range lookup in Postgres #972

Open vojtad opened 8 years ago

vojtad commented 8 years ago

Using one int8range column instead of 2 bigint columns for IP range in maxmind_geolite_city_blocks speeds up lookup from ~20 ms to ~0.1 ms. This is really big improvement.

Using 2 bigint columns:

=> EXPLAIN ANALYZE SELECT l.country, l.region, l.city, l.latitude, l.longitude FROM maxmind_geolite_city_location l WHERE l.loc_id IN (SELECT b.loc_id FROM maxmind_geolite_city_blocks b WHERE b.start_ip_num <= 2467698198 AND 2467698198 <= b.end_ip_num);
                                                                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=20608.41..21256.37 rows=22546 width=31) (actual time=18.715..18.726 rows=1 loops=1)
   ->  HashAggregate  (cost=20608.32..20622.06 rows=4580 width=8) (actual time=18.698..18.707 rows=1 loops=1)
         ->  Index Scan using index_maxmind_geolite_city_blocks_on_end_ip_num_range on maxmind_geolite_city_blocks b  (cost=0.09..20393.97 rows=428712 width=8) (actual time=0.009..18.689 rows=1 loops=1)
               Index Cond: ((2467698198::bigint <= end_ip_num) AND (start_ip_num <= 2467698198::bigint))
   ->  Index Scan using index_maxmind_geolite_city_location_on_loc_id on maxmind_geolite_city_location l  (cost=0.08..0.14 rows=1 width=39) (actual time=0.009..0.009 rows=1 loops=1)
         Index Cond: (loc_id = b.loc_id)
 Total runtime: 18.778 ms
(7 rows)

Time: 133.061 ms

Using int8range:

=> EXPLAIN ANALYZE SELECT l.country, l.region, l.city, l.latitude, l.longitude FROM maxmind_geolite_city_location l WHERE l.loc_id IN (SELECT b.loc_id FROM maxming_geolite_city_with_range_blocks b WHERE b.ip_range @> 2467698198);
                                                                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=4.17..8.18 rows=1 width=31) (actual time=0.066..0.067 rows=1 loops=1)
   ->  Unique  (cost=4.09..4.09 rows=1 width=8) (actual time=0.057..0.057 rows=1 loops=1)
         ->  Sort  (cost=4.09..4.09 rows=1 width=8) (actual time=0.056..0.056 rows=1 loops=1)
               Sort Key: b.loc_id
               Sort Method: quicksort  Memory: 25kB
               ->  Index Scan using index_maxmind_geolite_city_with_range_blocks_on_ip_range on maxming_geolite_city_with_range_blocks b  (cost=0.08..4.09 rows=1 width=8) (actual time=0.050..0.051 rows=1 loops=1)
                     Index Cond: (ip_range @> 2467698198::bigint)
   ->  Index Scan using index_maxmind_geolite_city_location_on_loc_id on maxmind_geolite_city_location l  (cost=0.08..4.09 rows=1 width=39) (actual time=0.006..0.006 rows=1 loops=1)
         Index Cond: (loc_id = b.loc_id)
 Total runtime: 0.104 ms
(10 rows)

Time: 114.100 ms

However, other databases do not support range types.

I would be happy to a create pull request to implement int8range for Postgres but I am not sure how to make other databases compatible. Do you have any ideas or tips? I would like to discuss them in this issue.

alexreisner commented 8 years ago

Thanks for this! Can we check for Postgres and only use int8range if it's present?

vojtad commented 8 years ago

I am currently investigating this. I will probably add new column ip_range to maxmind_geolite_city_blocks and use it if we are on Postgres. I will update this issue when pull request is ready.

Also, I am not sure whether Rails 3 supports range data types for Postgres or not. I have to check that, too.