alexreisner / geocoder

Complete Ruby geocoding solution.
http://www.rubygeocoder.com
MIT License
6.35k stars 1.19k forks source link

Ruby Geocoder nearbys is slow #531

Closed vicovictor closed 11 years ago

vicovictor commented 11 years ago

Using Rails 3.2, Ruby 1.9, geocoder gem. The following query I have in my controller:

#1500+ms to load
@nearbys = @shop.nearbys(100, :order => order).paginate(:include => :photos, 
:page => params[:page], :per_page => 25)

# SQL
Shop Load (1654.0ms)  SELECT shops.*, 3958.755864232 * 2 * ASIN(SQRT(POWER(SIN
((48.8582411618 - shops.lat) * PI() / 180 / 2), 2) + COS(48.8582411618 *
PI() / 180) * COS(shops.lat * PI() / 180) * POWER(SIN((2.2945044899 - shops.lng)
 * PI() / 180 / 2), 2))) AS distance, CAST(DEGREES(ATAN2( RADIANS(shops.lng - 
2.2945044899), RADIANS(shops.lat - 48.8582411618))) + 360 AS decimal) % 360 AS
 bearing FROM `shops` WHERE `shops`.`shop_type` = 'place' AND (shops.lat BETWEEN
47.410923330691524 AND 50.30555899290848 AND shops.lng BETWEEN 0.09468020504178343
 AND 4.494328774758216 AND 3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((48.8582411618
 - shops.lat) * PI() / 180 / 2), 2) + COS(48.8582411618 * PI() / 180) * 
COS(shops.lat * PI() / 180) * POWER(SIN((2.2945044899 - shops.lng) * PI() / 180 / 2), 
2))) <= 100 AND shops.id != 517) ORDER BY distance ASC LIMIT 25 OFFSET 0

The problem lies in the nearbys which does calculation on the longitude and latitude. I have already added indexes to the longitude and latitude columns, but it doesn't improve anything.

How can I improve this?

P/S: I removed unrelated conditions which doesn't contribute to the speed of the query.

alexreisner commented 11 years ago

How many rows are in your table? If it's very large you probably need to use a native spatial data type (eg PostGIS).

vicovictor commented 11 years ago

Currently just 100,000 rows.

vicovictor commented 10 years ago

I found that it was order that hurts the speed:

nearby_shops = current_shop.nearbys(10, :order => 'overall_rating DESC')
.where(:shop_type => shop_type).includes(:photos).limit(5)

If I removed the :order, it would be fast. How can I improve this? The overall_rating is already indexed.

alexreisner commented 10 years ago

I'm not sure what's going on there. Indexing shouldn't affect the speed of sorting.

vicovictor commented 10 years ago

This one is with overall_rating DESC:

# statement
nearby_shops = current_shop.nearbys(10, :order => 'overall_rating DESC')
.where(:shop_type => shop_type).includes(:photos).first(5)

# SQL query (4518.8ms)
Shop Load (4518.8ms)  SELECT shops.*, 3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((48.8582285 - shops.lat) 
* PI() / 180 / 2), 2) + COS(48.8582285 * PI() / 180) * COS(shops.lat * PI() / 180) * POWER(SIN((2.2943877 - 
shops.lng) * PI() / 180 / 2), 2))) AS distance, CAST(DEGREES(ATAN2( RADIANS(shops.lng - 2.2943877), 
RADIANS(shops.lat - 48.8582285))) + 360 AS decimal) % 360 AS bearing FROM `shops` WHERE 
`shops`.`shop_type` = 'place' AND (shops.lat BETWEEN 48.71349671688915 AND 49.00296028311085 AND 
shops.lng BETWEEN 2.074405327159442 AND 2.5143700728405585 AND 3958.755864232 * 2 * 
ASIN(SQRT(POWER(SIN((48.8582285 - shops.lat) * PI() / 180 / 2), 2) + COS(48.8582285 * PI() / 180) * 
COS(shops.lat * PI() / 180) * POWER(SIN((2.2943877 - shops.lng) * PI() / 180 / 2), 2))) <= 10 AND shops.id != 517) 
ORDER BY overall_rating DESC LIMIT 5

This one is without overall_rating DESC, but the default distance ASC instead:

# statement
nearby_shops = current_shop.nearbys(10)
.where(:shop_type => shop_type).includes(:photos).first(5)

# SQL query (415.6ms)
Shop Load (415.6ms)  SELECT shops.*, 3958.755864232 * 2 * ASIN(SQRT(POWER(SIN((48.8582285 - shops.lat) * 
PI() / 180 / 2), 2) + COS(48.8582285 * PI() / 180) * COS(shops.lat * PI() / 180) * POWER(SIN((2.2943877 - 
shops.lng) * PI() / 180 / 2), 2))) AS distance, CAST(DEGREES(ATAN2( RADIANS(shops.lng - 2.2943877), 
RADIANS(shops.lat - 48.8582285))) + 360 AS decimal) % 360 AS bearing FROM `shops` WHERE 
`shops`.`shop_type` = 'place' AND (shops.lat BETWEEN 48.71349671688915 AND 49.00296028311085 AND 
shops.lng BETWEEN 2.074405327159442 AND 2.5143700728405585 AND 3958.755864232 * 2 * 
ASIN(SQRT(POWER(SIN((48.8582285 - shops.lat) * PI() / 180 / 2), 2) + COS(48.8582285 * PI() / 180) * 
COS(shops.lat * PI() / 180) * POWER(SIN((2.2943877 - shops.lng) * PI() / 180 / 2), 2))) <= 10 AND shops.id != 517) 
ORDER BY distance ASC LIMIT 5
lifeBCE commented 10 years ago

Indexing can actually improve sorting significantly. I am sure if you run your query through explain you will see if doing some full table scans.

My general rules for indexing are:

  1. All primary keys
  2. All foreign keys
  3. Anything referenced in a where clause, group by or order by

The third rule above has a stipulation dealing with uniqueness/cardinality. Low cardinality indexes will be ignored and full table scans will happen anyway and then you are just suffering the overhead of rebuilding the indexes on writes. Sorting is usually the largest performance offender when dealing with missing indexes.