g-andrade / locus

MMDB reader for geolocation and ASN lookup of IP addresses
https://hexdocs.pm/locus/
MIT License
111 stars 15 forks source link

[Feature Request] Supporting IPinfo MMDB databases #43

Closed abdullahdevrel closed 2 months ago

abdullahdevrel commented 5 months ago

IPinfo.io also delivers data in the MMDB file format. The difference between MaxMind and DBIP's MMDB database format is that IPinfo uses a "tabular" data format.

Example from the IPinfo IP to Geolocation database:

FIELD NAME EXAMPLE DATA TYPE DESCRIPTION
start_ip 1.253.242.0 TEXT Starting IP address of an IP address range
end_ip 1.253.242.255 TEXT Ending IP address of an IP address range
join_key 1.253.0.0 TEXT Special variable to facilitate join operation
city Yangsan TEXT City of the location
region Gyeongsangnam-do TEXT Region of the location
country KR TEXT ISO 3166 country code
latitude 35.34199 FLOAT Latitude value of the location
longitude 129.03358 FLOAT Longitude value of the location
postal_code 50593 TEXT Postal code of the location
timezone Asia/Seoul TEXT Local time zone

You can try our free databases as well:

The free database provides full accuracy, is updated daily, and combines IPv4 and IPv6 data in a single dataset. The free IP database is licensed under CC-BY-SA 4.0 and permits commercial usage.

Update mechanism

The simple update mechanism uses a storage bucket URI and the access token as a parameter. The MMDB dataset is not zipped and can be directly downloaded:

curl -L https://ipinfo.io/data/standard_privacy.mmdb?token=<YOUR_TOKEN> -o privacy.mmdb

IPinfo also has a checksums API endpoint.

Samples and documentation

Please let me know what you think. Thanks!

g-andrade commented 5 months ago

Hi @abdullahdevrel, after a very quick look at your proposal, I think it’s important to clarify that MMDB as supported by locus refers to a particular binary format: https://maxmind.github.io/MaxMind-DB/

These tabular files (from a first look, either JSON or CSV) are, effectively, in an entirely different format that locus doesn’t know about. Calling them MMDB would be akin to saying an SVG image is just like a PNG one, only in vector format!

Let me know if I got it wrong - as I wrote earlier, I took a very quick look and got alarmed at seeing what may not be MMDB at all. An entirely different format would raise very substantially the technical cost for any kind of support.

g-andrade commented 5 months ago

My bad, I got the wrong thought when I read tabular. As long as it’s MMDB, there should be no issue. I’ll give it a try when I get a chance.

abdullahdevrel commented 5 months ago

@g-andrade My apologies. I should have been clearer. I wanted to mention that the structure is tabular. You know how in MaxMind, if you want to get the city name of an IP address, you have to go through city → en → iso_code. With us, it is just city.

Here is a GIF of using the MMDB database using the mmdbctl tool.

WindowsTerminal_LIlhqd7RG8

No rush. This is just a request. I really appreciate your taking a look. Thank you very much.

g-andrade commented 5 months ago

Overall compatibility

The free IPinfo ASN, Country and Country+ASN databases load alright from the local filesystem:

Screenshot from 2024-05-04 17-17-45

Possible IPv4 bug

IPv4 lookups may not be working (tried 93.184.215.14, the address for example.com); IPv6 lookups are (tried 2606:2800:21f:cb07:6820:80da:af6b:8b2c).

That reminded me of PR #39, closed a few months ago. Given what it appeared to suggest was in need of fixing: https://github.com/g-andrade/locus/blob/8d0019d3010fdd7a1bf6db68488d7be36eda7664/src/locus_mmdb_tree.erl#L48-L49 ..I tried switching the IPv4-in-IPv6 tree prefix from ::ffff:0:0/96 to ::/96, and it... worked? I can now get IPv4 entries from the IPinfo database.

There may be a 6+ years old bug in the code owing to my wrong interpretation of the spec, which I never encountered before out of chance.

Downloading (and updating) using HTTP

The databases can be loaded using HTTP (censor_query will censor the token in logged messages): Screenshot from 2024-05-04 17-44-39

Although I very quickly hit the default limit of 10 daily downloads per database, which is not too bad since locus caches the database locally and subsequent requests are conditional (it sends if-modified-since to avoid unnecessary data transfer) but will log errors.

abdullahdevrel commented 5 months ago

@g-andrade Thank you very much for looking into the request. Really appreciate it!

Although I very quickly hit the default limit of 10 daily downloads per database

Can you please let me know the email account or the access token you used to sign up? You can email it to me if you'd like. My email is abdullah@ipinfo.io. I will increase the rate limit.

I tried switching the IPv4-in-IPv6 tree prefix from ::ffff:0:0/96 to ::/96, and it... worked? I can now get IPv4 entries from the IPinfo database.

I will look into it. I have faced some similar errors when working with DuckDB's inet data type.

g-andrade commented 5 months ago

No worries about the IPv4 issue, it was indeed a bug that had been present in locus since forever, only it didn't show up with other databases: https://github.com/g-andrade/locus/issues/44

It's fixed in the latest version, which I pushed today.