InternetHealthReport / internet-yellow-pages

A knowledge graph for the Internet
https://iyp.iijlab.net
GNU General Public License v3.0
43 stars 18 forks source link

CAIDA's AS relationship #64

Closed romain-fontugne closed 1 week ago

romain-fontugne commented 1 year ago

Import CAIDA AS relationship data, it should be very similar to bgpkit as2rel crawler.

The data is available here: https://publicdata.caida.org/datasets/as-relationships/serial-2/ https://publicdata.caida.org/datasets/as-relationships/serial-1/

roopeshsn commented 1 year ago

According to the link you've provided it states that,

"The as-rel files contain p2p and p2c relationships.
The format is: <provider-as>|<customer-as>|-1 <peer-as>|<peer-as>|0|<source>"

But I was able to see only the data in the latter format (p2c) in the latest .txt file (20230801.as-rel2.txt.bz2), 1|5467|0|bgp

It'll be better if you clarify it @romain-fontugne.

romain-fontugne commented 1 year ago

Thanks @roopeshsn for looking at that. I just checked the latest file (20230801.as-rel2.txt.bz2) and the first few lines (after the long comments) seems OK to me:

1|5467|0|bgp
1|8641|0|bgp
1|50377|-1|bgp
1|51705|0|bgp
1|51728|0|bgp
1|59572|0|bgp
2|3999|-1|bgp

I think the README is wrong the format is

<provider-as>|<customer-as>|-1|<source>
<peer-as>|<peer-as>|0|<source>

I will report that to CAIDA, thanks!

romain-fontugne commented 1 year ago

I got back from CAIDA, we should use data in https://publicdata.caida.org/datasets/as-relationships/serial-1/ (not serial-2)

roopeshsn commented 1 year ago

These are the blockages right now,

m-appel commented 1 year ago

Ideally the crawler should fetch the latest version of the *.as-rel.txt.bz2 file, yes.

Your relationships are correct, although you will have to assign a direction when creating them, but this can be arbitrary as we always fetch them without direction.

I wonder if we should normalize the rel format with BGPKIT, since they use rel: 1 for customer-provider relationships instead of rel: -1. Any thoughts @romain-fontugne?

m-appel commented 1 year ago

Actually, now I think we should not change the source data, because if you then compare with the corresponding README, it gets confusing. I propose leaving the rel property as-is for now and maybe create a new "parallel", but directed, relationship for the customer-provider case at some point (not now).

romain-fontugne commented 1 year ago

Yes, I think we can keep the data as it is. But note that this data contains directed links, for the provider-customer relationships the direction is important.

m-appel commented 1 year ago

Is it though if we add it as a PEERS_WITH relationship? As far as I am aware we always match these without direction (and it is also not intuitive on which end of a directed PEERS_WITH relationship the provider and on which end the customer should be).

Anyways, to be consistent with the BGPKIT crawler, you can parse the lines in their current order, the direction in case of a provider-customer relationship is:

(Provider:AS)-[:PEERS_WITH {rel: -1}]->(Customer:AS)
romain-fontugne commented 1 year ago

There is at least one example where we use the PEERS_WITH direction: https://github.com/InternetHealthReport/internet-yellow-pages/blob/main/documentation/iij.md#iijs-main-competitors

yes, anyways, let's just be consistent with whatever we are doing with BGPKIT crawler