Open robd003 opened 2 years ago
Doing a network lookup to the GeoIP2 (or GeoLite2) on every single written row could dramatically decrease performance. Also possible errors like timeout's, disconnects etc. must be handle safely.
A more performant and error-safe way would be to import the desired GeoIP database as described here and using a JOIN
to resolve the ip_addr
field.
In contrast to the schema described in this guide, the network
column should be of type text
in order to use our <<
operator, see also our documentation of the IP
data type https://crate.io/docs/crate/reference/en/5.0/general/ddl/data-types.html#ip-addresses.
Example query:
SELECT * FROM uri_table, geoip2_network WHERE uri_table.ip << geoip2_network.network;
Can you please add a reasoning why a dynamic IP resolving during writes would be a better option for your use-case?
Doing a network lookup to the GeoIP2 (or GeoLite2) on every single written row could dramatically decrease performance. Also possible errors like timeout's, disconnects etc. must be handle safely.
The idea is to not use the webservice. MaxMind has a downloadable database that you can perform lookups with. Multiple geolocation companies use this database format.
The idea is to not use the webservice. MaxMind has a downloadable database that you can perform lookups with. Multiple geolocation companies use this database format.
Thanks for this clarification. Even that some of their databases are free to use by end-users, their usage is licensed under a proprietary license which, as far as I know, does not allow redistribution. For redistribution one would have to obtain a commercial redistribution license.
I could imagine to provide a geoip_lookup
scalar by implementing a dedicated, possible 3rd party plugin for CrateDB, which doesn't include the related databases but adds support for using a custom downloaded one. CrateDB includes a ExtraFunctions extension/plugin one can use as a blueprint to implement a scalar plugin.
I'm still interested in the benefit over importing the related database into CrateDB, would be great if you could describe it.
Use case: It would be really useful to be able to have a generated object with geolocation info automatically created when IP addresses are inserted into an IP field.
Feature description: It would be really cool if Crate could integrate the MaxMind GeoIP client with a way to update client databases without having to restart the entire Crate cluster. This is a pretty standard GeoIP database format that multiple vendors use. There are free databases available for users to download.
Example use case:
Maybe it's worth having options for which fields you want included in the response. (Example: include or exclude lat / lon co-ordinates, ISP name, Organization name, etc)