SocialHarvest / geobed

A simple, lightweight, embedded geocoder for Golang with city level accuracy
BSD 3-Clause "New" or "Revised" License
72 stars 25 forks source link

Make option to use different or limited or parts of data sets #4

Open tmaiaroto opened 9 years ago

tmaiaroto commented 9 years ago

The data files were put into a slice so that they could easily be configured and processed. Well, unfortunately they weren't consistent in format. So it kinda doesn't make sense to keep them like that, but it also kinda does. Especially if more will be added in the future.

This will need to be re-addressed and the more pressing issue is that there's quite a bit of memory usage for both sets as is. It would be nice to choose which sets are used because it can sacrifice accuracy for speed.

The Geonames set is far smaller and great for larger cities. The MaxMind set contains a LOT of data, but it may not necessarily be required for certain apps. It would be nice to allow the application to decide.

It might also be nice to allow certain cities to be included from the MaxMind set. For example, any with a population. Or cities from particular countries. So an option to limit the amount of data stored in memory would be great.

tmaiaroto commented 9 years ago

About 655mb of memory allocated to do the initial load. Then each lookup allocates about 1.4mb of memory. So I'd like to reduce the memory needs so that this package works on smaller virtual servers.

tmaiaroto commented 9 years ago

Apparently some of the records from MaxMind don't have lat/lng values set (they come out to 0 when parsed). So removing those has reduced the data set from 2,771,454 to 1,968,549 records.

This has reduced the memory allocation to about 496mb to load the set and 0.58mb per lookup (which are also now faster). Getting closer to running on a 512MB RAM VPS!

Removing the index on the first two characters saved a little more. 451mb to load the set now. 0.56mb per lookup (not expected to change in this case). That index wasn't being used, but could be to further increase lookup speed.

tmaiaroto commented 9 years ago

2,008,788 records now. The previous limitation was too aggressive. Still, that's 478mb to load into memory and 0.56mb per lookup (0.005s).

tooolbox commented 8 years ago

Is this memory growth bounded? I noticed that the first time I loaded geobed, it was using 2.5GB of memory (!!!!!) but on successive loads it was ~500mb and went up to ~650mb as I ran lookups. Will it eventually hit 2.5GB?

tmaiaroto commented 8 years ago

It does require a good bit of memory unfortunately. I thought it was bit less than 2.5GB though... Hmm.. I wanted to look into memory mapped files to reduce this. I was thinking about BoltDB at some point too.