fujiapple852 / trippy

A network diagnostic tool
https://trippy.cli.rs
Apache License 2.0
3.54k stars 75 forks source link

[Feature Request] Requesting the support of IPinfo Free IP to Country ASN dataset #862

Closed abdullahdevrel closed 9 months ago

abdullahdevrel commented 10 months ago

I am the DevRel of IPinfo. I would like to request supporting the IPinfo free IP to Country dataset in Trippy. Features of the database:

The database comes in MMDB format, so I believe it can be easily ingested in the project. Also, the data structure is flat and predictable. You can package our free IP to the Country ASN database with the project. For that, we will provide an access token that you can use. By using the IPinfo dataset, you can get both country-level geolocation information and ASN information from a single source.

Please let me know what you what you think. If you need any assistance, please let me know. Thanks.

Schema: https://ipinfo.io/developers/ip-to-country-asn-database

FIELD NAME EXAMPLE DATA TYPE DESCRIPTION
start_ip 1.0.16.0 TEXT Starting IP address of an IP address range
end_ip 1.0.31.255 TEXT Ending IP address of an IP address range
country JP TEXT ISO 3166 country code of the location
country_name Japan TEXT Name of the country
continent AS TEXT Continent code of the country
continent_name Asia TEXT Name of the continent
asn AS2519 TEXT Autonomous System Number
as_name ARTERIA Networks Corporation TEXT Name of the AS (Autonomous System) organization
as_domain arteria-net.com TEXT Official domain or website of the AS organization
fujiapple852 commented 10 months ago

Hi there @abdullahdevrel thanks for following up!

Trippy currently reads City data from mmdb files and so I think users would need to have the (premium?) IP Geolocation Extended to get that?

I tried downloading the sample in mmdb format but I was not able to get City data from it using the maxminddb crate, perhaps it does not support the ipinfo flavour of mmdb file?

Test code:

use std::net::{IpAddr, Ipv4Addr};
use maxminddb::geoip2::City;

fn main() {
    let reader = maxminddb::Reader::open_readfile("ip_geolocation_extended_ipv4_sample.mmdb").unwrap();
    let addr = IpAddr::V4(Ipv4Addr::from([50, 220, 147, 113]));
    let city_data = reader.lookup::<City<'_>>(addr);
    println!("{city_data:?}");
}

Fails with Err(DecodingError("invalid type: string \"Royal Oak\", expected struct City"))

(I tried decoding as Country as well)

Perhaps this is what you mean by the data being "flat"? Perhaps I have to deserialise to a custom struct with the "flat" structure? Is there a recommend mmdb reader crate for Rust that support ipinfo flavour of mmdb files?

You can package our free IP to the Country ASN database with the project

I'd prefer to allow user to bring their own files rather than bundle it, to keep size down and also to prevent stale data being used.

For that, we will provide an access token that you can use

I'm not quite sure what this is for, presumably the token is used for looking up the ipinfo API? I see you have an API for that. Would the token be something that could be bundled in Trippy for all users or just for development use? I prefer user-provided mmdb files over API access as Trippy will often be used in data centre environment with no external internet access.

By using the IPinfo dataset, you can get both country-level geolocation information and ASN information from a single source.

Just to note that Trippy currently get ASN data from the IP to ASN Mapping Service provided by Team Cymru via DNS TXT records, so it's mostly GeoIp (country, city, lat/long) that are needed.

fujiapple852 commented 10 months ago

With some trial and error I was able to figure out it is a HashMap<String, String> (I'm sure this is mentioned in your docs somewhere?):

reader.lookup::<HashMap<String, String>>(addr)

Doing that works:

Ok({"latitude": "42.48948", "longitude": "-83.14465", "postal_code": "48067", "radius": "500", "country": "US", "region": "Michigan", "network": "50.220.147.113-50.220.147.113", "timezone": "America/Detroit", "city": "Royal Oak", "geoname_id": "5007804"})

So that looks great.

The question now is, how does Trippy know if a given mmdb file is MaxMind or IpInfo flavoured? Is there some trick to figuring that out? I guess it could try both and see if either works?

fujiapple852 commented 10 months ago

I see that the mmdb files have a metadata attribute which could help tell them apart. Comparing the MaxMind and IpInfo mmdb files I can see this for the database_type attribute:

MaxMind (GeoLite2-City.mmdb):

Metadata { database_type: "GeoLite2-City" }

IpInfo (ip_geolocation_extended_ipv4_sample.mmdb):

Metadata { database_type: "ipinfo ip_geolocation_extended_ipv4_sample.mmdb" }

So unlike the MaxMind file, the IpInfo file has a database_type with the format ipinfo <file>, is that guaranteed to be the case?

fujiapple852 commented 10 months ago

WIP impl: https://github.com/fujiapple852/trippy/pull/871

fujiapple852 commented 10 months ago

@abdullahdevrel I would like Trippy to be able to consume either the free "IP to Country + ASN Database" mmdb file or the premium "IP to Geolocation Extended Database" mmdb file.

One quirk I notice is that the free "IP to Country + ASN Database" mmdb file has both country (code) and country_name fields whereas the premium "IP to Geolocation Extended Database" mmdb file has only the country.

From https://ipinfo.io/developers/ip-to-country-asn-database:

FIELD NAME EXAMPLE DATA TYPE DESCRIPTION
country JP TEXT ISO 3166 country code of the location
country_name Japan TEXT Name of the country

From https://ipinfo.io/developers/ip-to-geolocation-extended:

FIELD NAME EXAMPLE DATA TYPE DESCRIPTION
country US TEXT ISO 3166 country code of the location

Same story for continent.

abdullahdevrel commented 9 months ago

Hey @fujiapple852

My apologies for the late response. I really appreciate you considering our data for Trippy.

Just an FYI, my Rust skill is not very good.

How does Trippy know if a given mmdb file is MaxMind or IpInfo flavoured? Is there some trick to figuring that out? I guess it could try both and see if either works?

That is a very good question. MaxMind uses a nested data structure for their MMDB databases, while IPinfo uses a flat data structure.

MaxMind data structure for MMDB:

image

IPinfo data structure for MMDB:

image

As you have seen in MaxMind's MMDB reader library, they have declared the structs themselves, so they have native support for their different database. In the case of IPinfo, you have to declare the struct based on database schema, which you have already done in #871.

IPinfo has a flat and predictable data structure. The key will return an empty string even if the value does not exist. And boolean values are strings with true and "" ().

For Rust, this is usually what I send to users: https://gist.github.com/abdullahdevrel/ace2c80bd53a7323a18bbf8c8ae6a4d2

So unlike the MaxMind file, the IpInfo file has a database_type with the format ipinfo , is that guaranteed to be the case?

Yes. The database_type information will be prefaced with ipinfo .

$ mmdbctl metadata ipinfo_country_asn.mmdb
- Binary Format 2.0
- Database Type ipinfo country_asn.mmdb
- IP Version    6
- Record Size   32
- Node Count    5458524
- Description
    en ipinfo country_asn.mmdb
- Languages     en
- Build Epoch   1702629871

I think this database_type value is added when the data is compiled from the CSV file to the MMDB database.

I'm not quite sure what this is for, presumably the token is used for looking up the ipinfo API? I see you have an API for that. Would the token be something that could be bundled in Trippy for all users or just for development use?

The access token is for downloading the IPinfo database. To download the database, users need to run a command like this:

curl -L [https://ipinfo.io/data/free/country_asn.mmdb?token=<ACCESS_TOKEN](https://ipinfo.io/data/free/country_asn.mmdb?token=%3CACCESS_TOKEN)>

Although our API supports 1,000 tokenless requests/day and 50,000 requests/month with a token. Compared to our free IP database, free API does provide city and zip code level information.

I would like Trippy to be able to consume either the free "IP to Country + ASN Database" mmdb file or the premium "IP to Geolocation Extended Database" mmdb file.

We would love if you could use the "IP to Country + ASN Database". It is free and easily accessible for the project and the users, but it does not compromise accuracy at all. Support for this database would be incredible.

Here is the mmdb version of that database: https://www.transfernow.net/dl/20231218MUeQ39J8 (available for 7 days)

One quirk I notice is that the free "IP to Country + ASN Database" mmdb file has both country (code) and country_name fields whereas the premium "IP to Geolocation Extended Database" mmdb file has only the country.

We wanted to make the free IP to Country ASN database as accessible as possible. In our geolocation database, we do not provide the full country name or continent name, and we usually recommend users to use a reference object/dictionary for the full country name, currency, continent, isEu, etc.


For posterity

You have addressed this issue, but admittedly, I have not prepared the best Rust documentation. I am addressing it here in case someone stumbles upon this.

let citydata = reader.lookup::<City<'>>(addr); Fails with Err(DecodingError("invalid type: string \"Royal Oak\", expected struct City"))

This is due to IPinfo not having package native struct declarations. The user has to declare their own structs, and they should not declare a "generic argument" to the lookup function like <City<'_>>(addr).

Perhaps this is what you mean by the data being "flat"? Perhaps I have to deserialise to a custom struct with the "flat" structure? Is there a recommend mmdb reader crate for Rust that support ipinfo flavour of mmdb files?

Yes, this is spot on. The mmdb reader crate does its job of reading the mmdb files perfectly, however, as MaxMind developed this crate they have native support for their database through declaring the structs within the package.

IPinfo and MaxMind's databases are structured differently. So, when using IPinfo's database with mmdb reader crate, users need to declare the structs based on the database schema of the IPinfo database they are using.

Example:

fujiapple852 commented 9 months ago

Hi again @abdullahdevrel and thank you for the comprehensive reply!

The key will return an empty string even if the value does not exist

That is good to know, i'll adjust my impl accordingly to treat empty string as None (I don't think there are any boolean values to worry about here).

For Rust, this is usually what I send to users: https://gist.github.com/abdullahdevrel/ace2c80bd53a7323a18bbf8c8ae6a4d2

As yes, that works well.

and they should not declare a "generic argument" to the lookup function like <City<'_>>(addr).

Nit: note that these are equivalent (the latter infers the type parameter T from return type of lookup which much be IpinfoCountryASN to be assigned to record):

let record = reader.lookup::<IpinfoCountryASN>(ip_address).unwrap()
let record: IpinfoCountryASN = reader.lookup(ip_address).unwrap()

Yes. The database_type information will be prefaced with ipinfo .

Perfect, that was the key thing I needed to know.

We would love if you could use the "IP to Country + ASN Database". It is free and easily accessible for the project and the users, but it does not compromise accuracy at all. Support for this database would be incredible.

Trippy can certainly support that (it would use the country and continent names from that file, the AS data is not needed as it comes from elsewhere already).

Trippy can also support the extra attributes (city, postcode, lat/log/radius etc) provided by the premium files in a way where Trippy will look for and use these fields if available in the file provided. To put it another way, A user can provide either the "IP to Country + ASN Database" or the "IP to Geolocation Extended Database" mmdb file and Trippy will pick out the data is needs from either. Does that work?

Here is the mmdb version of that database: https://www.transfernow.net/dl/20231218MUeQ39J8 (available for 7 days)

Thank you, I have downloaded the file. Is this the same as the file I can download from https://ipinfo.io/account/data-downloads? (I registered account FujiApple on ipinfo.io a while ago).

fujiapple852 commented 9 months ago

@abdullahdevrel if you could help check the tests I added in #871 then we should be able to merge this.

fujiapple852 commented 9 months ago

Merged. This will be included in the 0.10.0 release of Trippy and will be mentioned in the release note.

abdullahdevrel commented 9 months ago

Thank you very much @fujiapple852!! Really appreciate it!!