TL;DR: Looks like the behavior in handling non-domains changed between publicsuffix2 and publicsuffixlist.
Recommend:
Check for . before calling get_base_domain
Upon release 8.6.0, tests against our corpus of data started failing. Looking into the issue, we found a set of reports from Yahoo that were causing the elasticsearch searches to contain None:
TL;DR: Looks like the behavior in handling non-domains changed between
publicsuffix2
andpublicsuffixlist
.Recommend:
.
before callingget_base_domain
Upon release 8.6.0, tests against our corpus of data started failing. Looking into the issue, we found a set of reports from Yahoo that were causing the elasticsearch searches to contain
None
:This was resulting in an error from elasticsearch:
in
parsedmarc/elastic.py", line 331, in save_aggregate_report_to_elasticsearch
The report in question was from Yahoo:
(domain and IP address changed)
Note the
<org_name>Yahoo</org_name>
:parsedmarc/__init__.py:254
checks fororg_name is not None and " " not in org_name
before callingget_base_domain
privatesuffix
in PublicSuffixList return None forpsl.privatesuffix('Yahoo')
(current use)get_public_suffix
in publicsuffix2 returnsyahoo
forpsl.get_public_suffix('Yahoo')
(prior use)Adding a check in
parse_aggregate_report_xml
to check for"." in org_name
as well, should resolve this.