geoip-lite / node-geoip

Native NodeJS implementation of MaxMind's GeoIP API -- works in node 0.6.3 and above, ask me about other versions
https://npmjs.org/package/geoip-lite
Other
2.32k stars 358 forks source link

Feature Request: Reverse Geocode #41

Open kshailove opened 10 years ago

kshailove commented 10 years ago

Thanks for the great module.

I have one more requirement though. I need to find out the city details from the latitude and longitude values. Since the maxmind database already contains tables which contain these values, should that be doable? I found some node modules which are calling google apis for this, however, I am trying to see if I can achieve this without calling a web service and can do it offline.

bluesmoon commented 10 years ago

Unfortunately, you might need boundary coordinates to do that since cities are not rectangular. The database does not contain bounding boxes.

kshailove commented 10 years ago

Sorry, I got confused with the answer. Let me elaborate the question. I realize that the table GeoLiteCityocation.csv already has this information. What I intend to do is below:-

  1. Input lat, long
  2. Iterate through each row in this table.
  3. Calculate the distance of the point in current row (lat1, long1) from the point (lat, long) using Harversine formula.
  4. Report the row with least distance.
  5. If required, sort based on latitude/longitude and do a binary search for faster performance.

This can be done in csv, the downside is performance though. Since I don't have any idea around how the binary dat format organizes the data, I would need some help to implement this. Would really appreciate a help on this.

elcct commented 10 years ago

That is interesting problem solution of I decided to implement.

First it creates trie of geo hashes computed from locations.

Introduces new method info that takes latitude and longitude as a parameters.

How it works:

  1. Creates geo hash from provided parameters, then creates 4 adjacent geo hashes
  2. Searches the trie for the hashes above
  3. Picks a result with a shortest distance computed via Harvesine formula to the provided parameters
  4. Returns info dictionary with country, city, region, ll and distance

Example:

var geoip = require('geoip-lite');

var geo = geoip.info(51.5072, 0.1275);

console.log(geo);

Result:

{ country: 'GB',
  city: 'Rainham',
  region: 'E4',
  ll: [ 51.5, 0.1833 ],
  distance: 3.944303662415857 }

If you think that is useful I could create a pull request

kshailove commented 10 years ago

Thanks, this will definitely be helpful, have you checked in into node module as well? I would request you to do that, please

Meanwhile, I did some intermediate research around this and found that kd-tree is also useful for such scenarios. I created an intermediate solution for myself with node module named 'kdt' and below are the stats (tested with GeoLiteCity-Location.csv having around 470,000 records):-

  1. CSV load time: around 20 seconds on my ubuntu (1.5 GB RAM, running on VMBox)
  2. Tree creation time: around 25 seconds.
  3. Average lookup time: 100 microseconds.

As you can see, the penalty is mostly around CSV read and tree creation. Unfortunately I am not good at coding for binary data, so the solution you have implemented will definitely worth exploring and I will also do a performance comparison and will let you know the results.

Also, can you please add areacode, metrocode and postal code wherever available.

kshailove commented 10 years ago

Sorry for repeated messages. I did a quick testing by replacing geoip.js and utils.js and putting geohash.js in the lib folder and found that there is a performance regression while loading the module, it now takes more than 40 seconds to load (previously it was quick, less than a second). Please see below

Loading module started: Tue Dec 24 2013 17:46:43 GMT+0530 (IST) Loading module finished: Tue Dec 24 2013 17:47:27 GMT+0530 (IST) IP Lookup started: Tue Dec 24 2013 17:47:27 GMT+0530 (IST) IP Lookup finished: Tue Dec 24 2013 17:47:27 GMT+0530 (IST) Geo lookup started: Tue Dec 24 2013 17:47:27 GMT+0530 (IST) Geo Lookup finished: Tue Dec 24 2013 17:47:27 GMT+0530 (IST)

elcct commented 10 years ago

Indeed the time it loads increases significantly. I have been thinking on how to optimize this. Sorry for limited response, but I am quite busy today.

kshailove commented 10 years ago

Hi elcct, Did you get time to look at it, would really appreciate a help. BTW, one of the ideas that came to my mind is save the trie structure to a file and then load it, no need to create the trie everytime, since the data will not frequently change.

elcct commented 10 years ago

Sorry I was on a break. I tried some other implementations of binary tree, but it doesn't make too much of a difference in terms of loading speed. I was able to speed up loading time by limiting the depth of the tree to say 4 first characters of the hash, but that would sacrifice look-up time.

Probably the best solution would be to use a database like MongoDB (Mongo has geo index built-in) - not sure if that is considered for this project though.

tommychang1983 commented 9 years ago

Hi, is there any update or update about the performance for this feature? :) I'm also interested in such feature.

mattvb91 commented 3 years ago

Currently have the same requirement to do a reverse lookup based on coordinates.

bluesmoon commented 3 years ago

This isn't a relevant feature for geo-ip. There is no IP involved. I'm sure there's a different module that is more appropriate.

Secondly, the implementation isn't correct. Haversine distance will tell you which city center is closest, but it does not mean you are actually in that city since cities aren't circular in shape. You'll may not even be in the same country or continent.