This PR implements the IANA root zone file crawler and closes #82. Since this is the first crawler that adds multi-label nodes, additional changes to the OpenINTEL crawlers were required to prevent conflicts.
Description
The IANA root zone file contains NS records for the top-level domains, as well as A/AAAA records for the authoritative name servers.
This is the first crawler that introduces multi-label nodes, namely we now have a combination DomainName:AuthoritativeNameServer, since every name server is identified by a domain name. In accordance with this change, this PR updates the other crawler that creates AuthoritativeNameServer nodes, namely the OpenINTEL crawler. Without this change there are conflicting constraints.
As part of changing the OpenINTEL crawler this PR also reduces the execution time of the link-computation phase of the crawler by a factor of 10. The current version used an inefficient method for iterating over the data.
How Has This Been Tested?
These changes have been tested as part of a full database creation and also repeated independently.
Types of changes
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
Checklist:
[x] My code follows the code style of this project.
[x] My change requires a change to the documentation.
This PR implements the IANA root zone file crawler and closes #82. Since this is the first crawler that adds multi-label nodes, additional changes to the OpenINTEL crawlers were required to prevent conflicts.
Description
The IANA root zone file contains NS records for the top-level domains, as well as A/AAAA records for the authoritative name servers.
This is the first crawler that introduces multi-label nodes, namely we now have a combination
DomainName:AuthoritativeNameServer
, since every name server is identified by a domain name. In accordance with this change, this PR updates the other crawler that createsAuthoritativeNameServer
nodes, namely the OpenINTEL crawler. Without this change there are conflicting constraints.As part of changing the OpenINTEL crawler this PR also reduces the execution time of the link-computation phase of the crawler by a factor of 10. The current version used an inefficient method for iterating over the data.
How Has This Been Tested?
These changes have been tested as part of a full database creation and also repeated independently.
Types of changes
Checklist: