This PR is a major refactor of the Pulse data backend, import/processing pipeline, and HTTPS display system. The major goals are:
Have the top-line HTTPS reporting show a % of all known services, not parent domains.
Have the top-line HTTPS reporting show the % complying with federal policy, not just "Uses HTTPS".
Support BOD 18-01 (cyber.dhs.gov) by adding a column measuring support for RC4/3DES/SSLv2/SSLv3, and including BOD 18-01 compliance in the top line measurement.
Support subdomains in a native, non-confusing way, and provide a clear (and performant) way to see all of them and download them to CSV.
Replace SSL Labs with SSLyze for TLS/SSL protocol-level analysis.
Eliminate confusion in agencies by fully matching the presentation of DHS HTTPS reporting so that we are showing the same numbers and data, aggregated in the same way.
To do so, this makes significant internal adjustments:
Adjusts the scanners and gatherers to use the new multi-gatherer in domain-scan, which consolidates gather results from multiple sources into a single CSV file that dedupes hostnames and notes which sources each hostname appears in.
Refactors processing.py to make use of the consolidated subdomain data, and to load every subdomain into the Domain table.
Refactors the Domain table to hold both parent domains and subdomains, and refactors the filtering and exporting code to account for this.
Redesigns the HTTPS table to drop the Uses column, to add a BOD 18-01 crypto column, and to add a full compliance column. (The table shows one column for BOD 18-01, the CSV also breaks it out into individual columns.)
Rewires the HTTPS table to show data for each subdomain, inline, when expanded per-parent domain. This is meant to be helpful for both usability and performance.
CSVs can now be downloaded for the entire HTTPS table (including all subdomains) or for all subdomains of a particular parent domain. CSVs are linked within the table.
Gzip compression is added, because OBVIOUSLY, but also because there's now more data we're transferring due to increased subdomain data stored and served.
Updates the HTTPS guidance page to document all the changes and link to additional information.
This PR is a major refactor of the Pulse data backend, import/processing pipeline, and HTTPS display system. The major goals are:
To do so, this makes significant internal adjustments:
processing.py
to make use of the consolidated subdomain data, and to load every subdomain into theDomain
table.Domain
table to hold both parent domains and subdomains, and refactors the filtering and exporting code to account for this.Uses
column, to add a BOD 18-01 crypto column, and to add a full compliance column. (The table shows one column for BOD 18-01, the CSV also breaks it out into individual columns.)