arp242 / goatcounter

Easy web analytics. No tracking of personal data.
https://www.goatcounter.com
Other
4.51k stars 184 forks source link

[Feature Request] Integrate IPinfo's IP-to-Country ASN database. #765

Open abdullahdevrel opened 2 months ago

abdullahdevrel commented 2 months ago

Hi,

I work for IPinfo, but I have been using Goatcounter for my personal projects for several years and have been exploring self-hosting it recently.

I would like to request the integration of the IPinfo IP to Country or IP to Country ASN/ISP database for Goatcounter. I believe that from a development philosophy, IPinfo’s free IP database is perfect for Goatcounter. Additionally, there are technical benefits as well.

Goatcounter specific benefits

Binary distribution issues and "MaxMind®️'s EULA"

Even though I have not made progress in selfhosting it, but I believe the binary file includes MaxMind’s country database which actually creates a tricky situation. As far I know they do not allow redistribution of their database even the free database. They have an EULA that requires users to download their own database using their access tokens

The value proposition of IPinfo's database is that it is simply CC-BY-SA 4.0. You can do whatever you want with it as long as you give attribution. Commercial usage is allowed as well. Librespeed is using our data by packaging it directly in the repo: https://github.com/librespeed/speedtest/issues/641#issuecomment-2254375165

ASN/ISP data

You have mentioned that city-level data is too granular, so maybe you can add the ASN/ISP data from the IP to Country ASN database as an additional data source. The ASN/ISP detection is based on network routing data.

Our country-level data, even though free, is a zero-compromise, fully accurate database. We support daily updates and offer range clustering. It is just a pure subset of our IP geolocation database, without the more granular location information and only provides country level data.

General Technical benefits

The database has the following features:

Database schema

Field Name Example Data Type Description
start_ip 1.0.16.0 TEXT Starting IP address of an IP address range
end_ip 1.0.31.255 TEXT Ending IP address of an IP address range
country JP TEXT ISO 3166 country code of the location
country_name Japan TEXT Name of the country
continent AS TEXT Continent code of the country
continent_name Asia TEXT Name of the continent
asn AS2519 TEXT Autonomous System Number
as_name ARTERIA Networks Corporation TEXT Name of the AS (Autonomous System) organization
as_domain arteria-net.com TEXT Official domain or website of the AS organization

Documentation: https://ipinfo.io/developers/ip-to-country-asn-database

Samples are available here: https://github.com/ipinfo/sample-database/tree/main/IP%20to%20Country%20ASN

The database can be downloaded simply by accessing the storage URI with an access token.

curl -L https://ipinfo.io/data/free/country_asn.mmdb?token=<YOUR_TOKEN> -o country_asn.mmdb

My apologies for the wall of text. Let me know what you think. Thank you!

arp242 commented 2 months ago

I have never been entirely happy about the Maxmind EULA situation, but a number of Linux distros ship the database as packages so I figured it would be fine. Basically a "better to ask forgiveness than permission"-type situation.

Your databases seem way larger; "IP to Country Database" is ~38M. That's far to large to include in the GoatCounter binary. The "Geolite countries" is ~3.7M. I don't know why it's so much larger? People can already use any mmdb database they want with the -geodb flag, but I also want a basic "good enough" database built in.

abdullahdevrel commented 2 months ago

Thank you for reviewing the request.

I have never been entirely happy about the Maxmind EULA situation, but a number of Linux distros ship the database as packages so I figured it would be fine. Basically a "better to ask forgiveness than permission"-type situation.

The challenge is that they explicitly have a commercial distribution license for these free databases, so I am not sure what the consequences of this are, to be honest. I am not sure if those Linux distros have their own licensing terms with them that permit the distribution like that.

Your databases seem way larger; "IP to Country Database" is ~38M.

That is because our database provides full accuracy. The accuracy extends down to the individual IP level, even for a country database. When you download an IP database, compromise happens in two ways: with infrequent updates and range clustering. However, because we are providing full accuracy, the resulting database is larger.

Another idea is that since you can download the database directly via a URI, users can download it during installation. This will eliminate the need to package it with a database in the first place within the binary. Also, this download mechanism can support database updates as well.

People can already use any mmdb database they want with the -geodb flag, but I also want a basic "good enough" database built in.

On a cursory view, it seems like the lookup mechanism is not database agnostic, but I could be wrong. There are structural differences between our database and MaxMind's (https://ipinfo.io/blog/migrating-from-maxmind-to-ipinfo/). Mainly:

Let me know what you think.

arp242 commented 2 months ago

I want GoatCounter to be a "Just Works" binary without external dependencies, so people can easily self-host with a minimum of fuss. Dealing with GeoIP database downloads rather goes against that.

I don't mind providing compatibility with it, but I don't think it will be the default if it's so much larger.


However, if I try to use it, it errors out with:

maxminddb: cannot unmarshal EU into type struct { Names map[string]string "maxminddb:\"names\""; Code string "maxminddb:\"code\""; GeoNameID uint "maxminddb:\"geoname_id\"" }

So I guess the database structure is different.

I don't want to "migrate to" anything, I want to be compatible with both. I don't understand why you don't just provide a "Maxmind-compatible database" as an option.

Going from country = maxmind_data['country']['iso_code'] to country = ipinfo_data['country'] is a silly change and it doesn't really matter all that much which one is used. Maybe one is marginally better, but not at least providing a compatible database is rather lacking in pragmatism.

abdullahdevrel commented 2 months ago

Thank you for reviewing. I understand that MaxMind's database is deeply integrated into the project and would require some engineering investment to adopt. We tried our best to provide the simplest and best data to use out there. Because of the ease of use and the quality of the data, it usually justifies making the engineering investment to adopt.

Due to the unpredictable nature of MaxMind's database structure, you have to wrap every call to get a value in switch/case statements. In our case, if we do not have the data, we simply return an empty string. Making a drop-in MaxMind integration compatible database would essentially be a compromise, in my personal opinion, as you have to create a nested version of the database, which will increase its size.