InternetHealthReport / internet-yellow-pages

A knowledge graph for Internet resources
GNU General Public License v3.0
39 stars 16 forks source link

Integrate WHOIS data from rir-data.org #59

Closed m-appel closed 7 months ago

m-appel commented 1 year ago

Explain the dataset you want to add and how it would contribute to the Internet Yellow Pages. rir-data.org provides consolidated WHOIS data that integrates data from all RIRs and is sanitized. Assuming that it will be maintained, it is a valuable resource for IYP.

Provide the name of the organization providing the data and the url to the dataset

If possible describe how you woud like to model the dataset in the Yellow Pages One object looks like this

{
   "af" : 4,
   "country" : "US",
   "created" : 1555027200,
   "descr" : "Akamai Technologies",
   "end_address" : "23.219.183.255",
   "last-modified" : 1555027200,
   "mnt-by" : "MNT-AKAMAI",
   "netname" : null,
   "origin" : 20940,
   "prefixes" : [
      "23.219.183.0/24"
   ],
   "serial" : 748705,
   "source" : "ARIN",
   "start_address" : "23.219.183.0",
   "status" : "ALLOCATED",
   "use_route" : true
}

I think we should integrate at least the following relationships:

(:AS)-[:ORIGINATE]->(:Prefix)
(:Prefix)-[:COUNTRY]->(:Country)
(:Prefix)-[:NAME]->(:Name)

From their paper it is not entirely clear if the name is contained in desc or netname, I guess we have to fetch some data and find out.

Not sure how to use the mnt-by, it's not really a name. Technically it links to an organization, but we do not have the info of that org (that's a different part of WHOIS). However, we could still create Organization nodes and do this:

(:Prefix)-[:MANAGED_BY]->(:Organization)
romain-fontugne commented 1 year ago

This one may requires some work to really understand where the data is from

Origin AS

I have to read the paper but I believe the origin AS comes from IRR. And the given example seems to support that:

» whois 23.219.183.0/24

Gives nothing but,

» whois -h rr.arin.net 23.219.183.0/24
route:          23.192.0.0/11
descr:          Akamai Technologies
origin:         AS20940
mnt-by:         MNT-AKAMAI
source:         ARIN
changed:        ip-admin@akamai.com 20201029

route:          23.219.183.0/24
descr:          Akamai Technologies
origin:         AS20940
mnt-by:         MNT-AKAMAI
source:         ARIN
changed:        ip-admin@akamai.com 20190412

I'd like to keep ORIGINATE for BGP data, as in an AS originates a prefix in BGP. The meaning of IRR (and RPKI) are different, it is more like these ASes are authorized to originate the prefix, it doesn't mean they actually do it. Following what we have for RPKI, we can have a:

(:AS)-[:ROUTING_REGISTRY]-(:Prefix)

Or simply reuse the ROUTE_ORIGIN_AUTHORIZATION link type we already have. I'm not sure what would be the best.

Country

I could be wrong but I don't think whois/irr has country information per country. I guess they get this from delegate files, which are already in iyp. So just to make sure we don't duplicate data we should double check that.

Maintainer

The mnt-by is interesting. I wonder if we should consider this as an Organization or an external ID for organizations.

m-appel commented 1 year ago

Okay, it seems like the data is not as clear as we thought, so let's double check before starting any implementation.

MAVRICK-1 commented 7 months ago

@m-appel is this issue still not resolved ? If not I will like to work on

m-appel commented 7 months ago

It is not resolved, but we also have no plans to implement it. I'll check with @romain-fontugne if we can close this issue.

MAVRICK-1 commented 7 months ago

It is not resolved, but we also have no plans to implement it. I'll check with @romain-fontugne if we can close this issue.

Hi @m-appel , I have a query . Can you pls tell me from where we compute new database? In this Repo

m-appel commented 7 months ago

I'm not sure I understand correctly. You mean how to create the complete database from scratch? That's described on the main page.

MAVRICK-1 commented 7 months ago

@m-appel I was reffering to this Line

m-appel commented 7 months ago

Oh, this is related to GSoC so this issue is really not the right place for this, but we create a weekly database dump on one of our servers using the method I linked above and publish it here (which is also mentioned on the main README).

MAVRICK-1 commented 7 months ago

Oh, this is related to GSoC so this issue is really not the right place for this, but we create a weekly database dump on one of our servers using the method I linked above and publish it here (which is also mentioned on the main README).

Thankyou sir very much, Now I understood.

m-appel commented 7 months ago

Discussed with Romain and we have no plans to integrate this data.