Closed romain-fontugne closed 1 week ago
According to the link you've provided it states that,
"The as-rel files contain p2p and p2c relationships.
The format is:
<provider-as>|<customer-as>|-1
<peer-as>|<peer-as>|0|<source>
"
But I was able to see only the data in the latter format (p2c) in the latest .txt
file (20230801.as-rel2.txt.bz2
),
1|5467|0|bgp
It'll be better if you clarify it @romain-fontugne.
Thanks @roopeshsn for looking at that. I just checked the latest file (20230801.as-rel2.txt.bz2) and the first few lines (after the long comments) seems OK to me:
1|5467|0|bgp
1|8641|0|bgp
1|50377|-1|bgp
1|51705|0|bgp
1|51728|0|bgp
1|59572|0|bgp
2|3999|-1|bgp
I think the README is wrong the format is
<provider-as>|<customer-as>|-1|<source>
<peer-as>|<peer-as>|0|<source>
I will report that to CAIDA, thanks!
I got back from CAIDA, we should use data in https://publicdata.caida.org/datasets/as-relationships/serial-1/ (not serial-2)
These are the blockages right now,
<provider-as>|<customer-as>|-1
and <peer-as>|<peer-as>|0
. So the relationship will look like (:AS {asn: xxxx})-[:PEERS_WITH {rel: -1}]-(:AS {asn: xxxx})
and (:AS {asn: xxxx})-[:PEERS_WITH {rel: 0}]-(:AS {asn: xxxx})
right?Ideally the crawler should fetch the latest version of the *.as-rel.txt.bz2
file, yes.
Your relationships are correct, although you will have to assign a direction when creating them, but this can be arbitrary as we always fetch them without direction.
I wonder if we should normalize the rel
format with BGPKIT, since they use rel: 1
for customer-provider relationships instead of rel: -1
. Any thoughts @romain-fontugne?
Actually, now I think we should not change the source data, because if you then compare with the corresponding README, it gets confusing. I propose leaving the rel
property as-is for now and maybe create a new "parallel", but directed, relationship for the customer-provider case at some point (not now).
Yes, I think we can keep the data as it is. But note that this data contains directed links, for the provider-customer relationships the direction is important.
Is it though if we add it as a PEERS_WITH
relationship? As far as I am aware we always match these without direction (and it is also not intuitive on which end of a directed PEERS_WITH
relationship the provider and on which end the customer should be).
Anyways, to be consistent with the BGPKIT crawler, you can parse the lines in their current order, the direction in case of a provider-customer relationship is:
(Provider:AS)-[:PEERS_WITH {rel: -1}]->(Customer:AS)
There is at least one example where we use the PEERS_WITH
direction:
https://github.com/InternetHealthReport/internet-yellow-pages/blob/main/documentation/iij.md#iijs-main-competitors
yes, anyways, let's just be consistent with whatever we are doing with BGPKIT crawler
Import CAIDA AS relationship data, it should be very similar to bgpkit as2rel crawler.
The data is available here:
https://publicdata.caida.org/datasets/as-relationships/serial-2/https://publicdata.caida.org/datasets/as-relationships/serial-1/