Ironholds / rgeolocate

Generalised IP geolocation through R
https://cran.r-project.org/web/packages/rgeolocate/index.html
Other
66 stars 22 forks source link

IP2LOCATION LITE support #23

Closed stenevang closed 7 years ago

stenevang commented 8 years ago

I have found that http://lite.IP2LOCATION.com offer free versions of their databases which are much more complete and accurate than the GeoLite2 City database from MaxMind. While the MaxMind database has about 3.2 million rows, the IP2LOCATION DB11 database has about 3.9 million rows. Also, the latitude and longitude values have six decimals in IP2LOCATION, while MaxMind has only four decimals.

The IP2LOCATION DB11 database can be found here: http://lite.ip2location.com/database-ip-country-region-city-latitude-longitude-zipcode-timezone

it would be a great value addition to rgeolocate if it would also support working with the binary database file version of DB11 from IP2LOCATION.

Ironholds commented 8 years ago

Looks like they have a C API we could look into integrating. @wrathematics and @hrbrmstr , thoughts? You know how I am with C ;p

wrathematics commented 8 years ago

I can take a look after finals in a few weeks.

One thing to point out is that C library is LGPL v3 licensed. I'm not a lawyer, but I think that creates an issue. If you make it possible to link with the library (say using --configure-args) I think it's fine (wouldn't be if it were GPL v2 btw), but if you ship their code, I think the entire project must be LGPL licensed (and not merely LGPL "compatible"). You would presumably have to clear this (again) with the MaxMind folks.

Not 100% on any of that, but dealing with multiple licenses always makes me nervous.

Ironholds commented 8 years ago

Dealing with any GPL licenses makes me nervous spits.

So it looks like our options for integration then would be to ship this distinctly - which would literally be its own package, because they don't actually ship binaries - or entirely implement our own version, which, yuck.

wrathematics commented 8 years ago

Well, the package is already linking with GPL software (Rcpp and R itself), so according to FSF lawyers, it's already part of a "larger" GPL licensed software. Not every lawyer agrees about this by the way (contrary to what GPL zealots would have you believe). No one really knows what's legal or not because this has never been tested in court. But I think it's polite to respect authors' wishes on software licensing; you should assume that someone who licenses GPL intends that static linking is infectious.

The issue is really about authorship, I think, which is almost more of a CRAN issue. Anyway, tldr, shit's complicated.

Ironholds commented 8 years ago

Note; I actually got permission to cross-license ip2location as apache2! https://github.com/chrislim2888/IP2Location-C-Library/issues/3#event-825424246

In config terms it shouldn't(?) be too much of a PITA to adapt? I'll find out.

hrbrmstr commented 8 years ago

it doesn't look too bad. I need to see what autoconf/configure generate for mingw & ubuntu/debian

Ironholds commented 8 years ago

I'm having some luck with it, actually - seems like the only compiler flags are a win32 flag libmaxminddb already relies on and a fairly standard Apple thing. I think the Apple thing could be a problem, but I'm gonna get it working first on my local and then worry about config.

On Monday, 17 October 2016, boB Rudis notifications@github.com wrote:

it doesn't look too bad. I need to see what autoconf/configure generate for mingw & ubuntu/debian

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ironholds/rgeolocate/issues/23#issuecomment-254384950, or mute the thread https://github.com/notifications/unsubscribe-auth/ACXz3illIyZfrWNK3HTWrDst5umEx6rtks5q1Ca2gaJpZM4GrdmB .

Ironholds commented 8 years ago

Now internally integrated and takes literally a minute to process 1m IPs. I'm going to refactor and see if that speeds things up (passing the results around may be more expensive than additional lookups? we'll see)

Ironholds commented 8 years ago

So this is...integrated? In two forms.

The weird thing is it's slow. Like, really really slow. Like, the fastest I can make it go (using a teeny-fielded database and only looking for a couple of fields) is 10 times slower than MaxMind. Am I goofing the implementation, or is it a slow binary format, or..? But at this rate it'd take 30 seconds to handle 1m IPs, assuming you're using the smallest database type and smallest number of requested fields, and that doesn't feel greeeat.

Ironholds commented 8 years ago

Answer: I forgot memory caching was a thing. When used, we can process 1m IPs in 3 seconds. Not too shabby.

Ironholds commented 7 years ago

This should now be integrated and work on Windows!