[Feature Request] Integrate IPinfo's IP-to-Country ASN database.

abdullahdevrel commented 2 months ago

Hi,

I work for IPinfo, but I have been using Goatcounter for my personal projects for several years and have been exploring self-hosting it recently.

I would like to request the integration of the IPinfo IP to Country or IP to Country ASN/ISP database for Goatcounter. I believe that from a development philosophy, IPinfo’s free IP database is perfect for Goatcounter. Additionally, there are technical benefits as well.

Goatcounter specific benefits

Binary distribution issues and "MaxMind®️'s EULA"

Even though I have not made progress in selfhosting it, but I believe the binary file includes MaxMind’s country database which actually creates a tricky situation. As far I know they do not allow redistribution of their database even the free database. They have an EULA that requires users to download their own database using their access tokens

The value proposition of IPinfo's database is that it is simply CC-BY-SA 4.0. You can do whatever you want with it as long as you give attribution. Commercial usage is allowed as well. Librespeed is using our data by packaging it directly in the repo: https://github.com/librespeed/speedtest/issues/641#issuecomment-2254375165

ASN/ISP data

You have mentioned that city-level data is too granular, so maybe you can add the ASN/ISP data from the IP to Country ASN database as an additional data source. The ASN/ISP detection is based on network routing data.

Our country-level data, even though free, is a zero-compromise, fully accurate database. We support daily updates and offer range clustering. It is just a pure subset of our IP geolocation database, without the more granular location information and only provides country level data.

General Technical benefits

The database has the following features:

It includes country and ASN information in the same database.
It is updated daily, with zero compromise to accuracy. There is no range clustering, and the database provides full accuracy.
The data granularity reaches individual IP level.
The database comes in MMDB database format.
It is licensed under CC-BY-SA 4.0, permitting commercial usage.
Available file formats include: CSV, MMDB, JSON
The data is tabular and unnested, making it very easy to use. The dataset includes both IPv4 and IPv6 in a single file.

Database schema

Field Name	Example	Data Type	Description
`start_ip`	1.0.16.0	TEXT	Starting IP address of an IP address range
`end_ip`	1.0.31.255	TEXT	Ending IP address of an IP address range
`country`	JP	TEXT	ISO 3166 country code of the location
`country_name`	Japan	TEXT	Name of the country
`continent`	AS	TEXT	Continent code of the country
`continent_name`	Asia	TEXT	Name of the continent
`asn`	AS2519	TEXT	Autonomous System Number
`as_name`	ARTERIA Networks Corporation	TEXT	Name of the AS (Autonomous System) organization
`as_domain`	arteria-net.com	TEXT	Official domain or website of the AS organization

Documentation: https://ipinfo.io/developers/ip-to-country-asn-database

Samples are available here: https://github.com/ipinfo/sample-database/tree/main/IP%20to%20Country%20ASN

The database can be downloaded simply by accessing the storage URI with an access token.

curl -L https://ipinfo.io/data/free/country_asn.mmdb?token=<YOUR_TOKEN> -o country_asn.mmdb

My apologies for the wall of text. Let me know what you think. Thank you!

arp242 commented 2 months ago

I have never been entirely happy about the Maxmind EULA situation, but a number of Linux distros ship the database as packages so I figured it would be fine. Basically a "better to ask forgiveness than permission"-type situation.

Your databases seem way larger; "IP to Country Database" is ~38M. That's far to large to include in the GoatCounter binary. The "Geolite countries" is ~3.7M. I don't know why it's so much larger? People can already use any mmdb database they want with the -geodb flag, but I also want a basic "good enough" database built in.

abdullahdevrel commented 2 months ago

Thank you for reviewing the request.

I have never been entirely happy about the Maxmind EULA situation, but a number of Linux distros ship the database as packages so I figured it would be fine. Basically a "better to ask forgiveness than permission"-type situation.

The challenge is that they explicitly have a commercial distribution license for these free databases, so I am not sure what the consequences of this are, to be honest. I am not sure if those Linux distros have their own licensing terms with them that permit the distribution like that.

Your databases seem way larger; "IP to Country Database" is ~38M.

That is because our database provides full accuracy. The accuracy extends down to the individual IP level, even for a country database. When you download an IP database, compromise happens in two ways: with infrequent updates and range clustering. However, because we are providing full accuracy, the resulting database is larger.

Another idea is that since you can download the database directly via a URI, users can download it during installation. This will eliminate the need to package it with a database in the first place within the binary. Also, this download mechanism can support database updates as well.

People can already use any mmdb database they want with the -geodb flag, but I also want a basic "good enough" database built in.

On a cursory view, it seems like the lookup mechanism is not database agnostic, but I could be wrong. There are structural differences between our database and MaxMind's (https://ipinfo.io/blog/migrating-from-maxmind-to-ipinfo/). Mainly:

We have the location built in, while they provide the geoname_id and a complementary geoname database
Our database structure is flat/tabular, while they opt for a nested database structure.

Let me know what you think.

arp242 commented 2 months ago

I want GoatCounter to be a "Just Works" binary without external dependencies, so people can easily self-host with a minimum of fuss. Dealing with GeoIP database downloads rather goes against that.

I don't mind providing compatibility with it, but I don't think it will be the default if it's so much larger.

However, if I try to use it, it errors out with:

maxminddb: cannot unmarshal EU into type struct { Names map[string]string "maxminddb:\"names\""; Code string "maxminddb:\"code\""; GeoNameID uint "maxminddb:\"geoname_id\"" }

So I guess the database structure is different.

I don't want to "migrate to" anything, I want to be compatible with both. I don't understand why you don't just provide a "Maxmind-compatible database" as an option.

Going from country = maxmind_data['country']['iso_code'] to country = ipinfo_data['country'] is a silly change and it doesn't really matter all that much which one is used. Maybe one is marginally better, but not at least providing a compatible database is rather lacking in pragmatism.

abdullahdevrel commented 2 months ago

Thank you for reviewing. I understand that MaxMind's database is deeply integrated into the project and would require some engineering investment to adopt. We tried our best to provide the simplest and best data to use out there. Because of the ease of use and the quality of the data, it usually justifies making the engineering investment to adopt.

Due to the unpredictable nature of MaxMind's database structure, you have to wrap every call to get a value in switch/case statements. In our case, if we do not have the data, we simply return an empty string. Making a drop-in MaxMind integration compatible database would essentially be a compromise, in my personal opinion, as you have to create a nested version of the database, which will increase its size.

arp242 / goatcounter