cds-snc / pulse

Archived: [Project has been split out into two components, @ https://github.com/cds-snc/tracker and https://github.com/cds-snc/track-web ] Check whether a Government of Canada domain is adhering to best security practices.
Other
6 stars 1 forks source link

Domain ownership rework #111

Closed buckley-w-david closed 6 years ago

buckley-w-david commented 6 years ago

This PR addresses https://github.com/cds-snc/pulse/issues/109

Since the owner of a parent domain is not necessarily the owner of it's subdomains (canada.ca vs digital.canada.ca), a refactor/rewrite of the scanning and processing portions is needed.

The solution is to eliminate the distinction between scanning "parents" and "subdomains", and just pack all domains into one list, all of which are scanned, and to introduce a new concept that is the "owner" list.

A "owner" is a lot like what parents used to be, but has no constraints on the domain level, so an owner could be canada.ca or digital.canada.ca, or a.b.c.d.ca.

When processing scan results, instead of parsing out the base domain of subdomains and using that to find the organization that owns them, the domain is put through a process that progressively removes levels of their domain (sub.digital.canada.ca -> digital.canada.ca -> canada.ca etc) until it finds a match in the owners list. Once the match is found, the organization information of that owner is used for the domain.

If no levels of the domain appears in the owners list, it is given the default organization of "Government of Canada".

With the current implementation, only "owner" domains will be processes so that they can have subdomains on the "domains" page of the flask app, domains that never get a hit on an owner and fall back to the default will always be displayed as top level domains even if they are technically subdomains of other domains in the dataset.

An example of this for clairity is as follows:

domains owners
canada.ca canada.ca
sub.canada.ca digital.canada.ca
digital.canada.ca
sub.canada.ca
test.ca
sub.test.ca

in this example, the domains page of the website would have 4 entries,canada.ca which could be expanded to also show sub.canada.ca, digital.canada.ca which could be expanded to also show sub.digital.canada.ca, test.ca, and sub.test.ca

This decision was made party due to the fact that since there is no owner listed, we have no idea if there is a different owner at any level, and partly due to an implementation detail that makes it difficult to know what the "highest" level version of the domain chain is since it when processing any given domain "a.b.c.d", the domain "b.c.d" could either

  1. not been encountered yet, or..
  2. not be in the list in the first place