InternetHealthReport / internet-yellow-pages

A knowledge graph for the Internet
https://iyp.iijlab.net
GNU General Public License v3.0
43 stars 18 forks source link

Emile's crowdsourced ASnames added #46

Closed emileaben closed 1 year ago

emileaben commented 1 year ago

Explain the dataset you want to add and how it would contribute to the Internet Yellow Pages. I'm "crowdsourcing" ASnames. Currently this is just a single file that I accept edits on. It is at: https://github.com/emileaben/asnames

Provide the name of the organization providing the data and the url to the dataset

space delimited file with 2 fields: field1 asn , field2 name from commit history you might be able to derive who submitted the name (or I could add it explicitly)

Very experimental at the moment, but should provide short and useful AS names. Short because screen space is precious, esp. for viz

If possible describe how you woud like to model the dataset in the Yellow Pages

roopeshsn commented 1 year ago

Hi, @emileaben! Currently the relationship between an ASN and the AS name is modeled like this,

MATCH (a:AS {asn: 15378})-[r:NAME]-(b:Name) RETURN a,b

graph (10)

As you see the relationship, there are multiple relationships with the same variants of names for an AS. Whether this is done intentionally to have the variants of the name @romain-fontugne? We can follow the same for your dataset @emileaben. Also,

emileaben commented 1 year ago

Yes, an AS can have multiple names (depending on who recorded the naming). Just like Farrokh Bulsara (his official name), is better known as Freddie Mercury ( https://en.wikipedia.org/w/index.php?title=Farrokh_Bulsara ).

e

On Tue, May 16, 2023 at 4:10 AM Roopesh Saravanan @.***> wrote:

Hi, @emileaben https://github.com/emileaben! Currently the relationship between an ASN and the AS name is modeled like this,

MATCH (a:AS {asn: 15378})-[r:NAME]-(b:Name) RETURN a,b

[image: graph (10)] https://user-images.githubusercontent.com/70762571/238511849-ef8039ab-cce7-42de-ab49-d62fdceb4537.png

As you see the relationship, there are multiple relationships with the same variants of names for an AS. Whether this is done intentionally to have the variants of the name @romain-fontugne https://github.com/romain-fontugne? We can follow the same for your dataset @emileaben https://github.com/emileaben. Also,

— Reply to this email directly, view it on GitHub https://github.com/InternetHealthReport/internet-yellow-pages/issues/46#issuecomment-1548864253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF6PKNB4KOVN4ZDMGAYGHTXGLOYBANCNFSM6AAAAAAX7EWHDY . You are receiving this because you were mentioned.Message ID: @.*** com>

emileaben commented 1 year ago

Oh, and my dataset can make this (multiple names for same ASN) even worse, it can have multiple names for the same ASN within the data-source. I've also added a 'contributor' ID (for now that can be a property on the link (or initially you can leave it out?)

e

On Tue, May 16, 2023 at 8:58 AM Emile Aben @.***> wrote:

Yes, an AS can have multiple names (depending on who recorded the naming). Just like Farrokh Bulsara (his official name), is better known as Freddie Mercury ( https://en.wikipedia.org/w/index.php?title=Farrokh_Bulsara ).

e

On Tue, May 16, 2023 at 4:10 AM Roopesh Saravanan < @.***> wrote:

Hi, @emileaben https://github.com/emileaben! Currently the relationship between an ASN and the AS name is modeled like this,

MATCH (a:AS {asn: 15378})-[r:NAME]-(b:Name) RETURN a,b

[image: graph (10)] https://user-images.githubusercontent.com/70762571/238511849-ef8039ab-cce7-42de-ab49-d62fdceb4537.png

As you see the relationship, there are multiple relationships with the same variants of names for an AS. Whether this is done intentionally to have the variants of the name @romain-fontugne https://github.com/romain-fontugne? We can follow the same for your dataset @emileaben https://github.com/emileaben. Also,

— Reply to this email directly, view it on GitHub https://github.com/InternetHealthReport/internet-yellow-pages/issues/46#issuecomment-1548864253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF6PKNB4KOVN4ZDMGAYGHTXGLOYBANCNFSM6AAAAAAX7EWHDY . You are receiving this because you were mentioned.Message ID: @.*** com>

romain-fontugne commented 1 year ago

Indeed, it is intentional to have multiple names for the same AS.

@roopeshsn do you want to work on this? I guess the code will be similar to the crawler you just implemented.

m-appel commented 1 year ago

I would also be in favor of storing the contributor in the relationship. So basically keep reference_url: https://raw.githubusercontent.com/emileaben/asnames/main/asnames.csv and annotate the NAME relationship with a contributor property like Emile said.

roopeshsn commented 1 year ago

Yep, I'll work on this @romain-fontugne!

romain-fontugne commented 1 year ago

thanks!