Closed romain-fontugne closed 1 year ago
Do we want to create the DomainName chain we talked about as a new postprocess script? I think that would actually be more performant, since we can fetch all existing DomainName nodes in one go and then just fill in the gaps.
For completeness sake, we are thinking of modelling each part of the DNS name now, for example:
(:DomainName {name: 'g.doubleclick.net'})-[:PART_OF]->
(:DomainName {name: 'doubleclick.net'})-[:PART_OF]->
(:DomainName {name: 'net'})
because there are some DomainNames that do not resolve to an IP, but their subdomains do.
But long story short, I would create a separate issue for this?
yes, we should make a separate issue for this. It will be in a different (post) script
Hi, @m-appel @romain-fontugne! Could you give me more context on this issue and the post-processing script? I understood that there are multiple domain names that belong to an IP. After crawling and pushing the data to the IYP, the post-processing script will work to group the domain names right?
Hey, I will open a separate issue today with more details and will mark you there!
Import CISCO's Umbrella top domain name list. Data is available here: https://umbrella-static.s3-us-west-1.amazonaws.com/index.html
The added relationships should look like this: