domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
966 stars 210 forks source link

Reports with org_name having no space and no '.' fail after change to PublicSuffixList #410

Closed gaige closed 1 year ago

gaige commented 1 year ago

TL;DR: Looks like the behavior in handling non-domains changed between publicsuffix2 and publicsuffixlist.

Recommend:

Upon release 8.6.0, tests against our corpus of data started failing. Looking into the issue, we found a set of reports from Yahoo that were causing the elasticsearch searches to contain None:

    INFO:elastic.py:295:Saving aggregate report to Elasticsearch
query = Bool(must=[MatchPhrase(org_name=None), MatchPhrase(report_id='1681434617.68896'), MatchPhrase(published_policy__domain='OURDOMAIN.COM'), Match(date_begin=datetime.datetime(2023, 4, 13, 0, 0, tzinfo=datetime.timezone.utc)), Match(date_end=datetime.datetime(2023, 4, 13, 23, 59, 59, tzinfo=datetime.timezone.utc))]) 

This was resulting in an error from elasticsearch:

elasticsearch.exceptions.RequestError: RequestError(400, 'x_content_parse_exception', '[1:59] [bool] failed to parse field [must]')

in parsedmarc/elastic.py", line 331, in save_aggregate_report_to_elasticsearch

The report in question was from Yahoo:

<?xml version="1.0"?>   
<feedback>  
  <report_metadata> 
    <org_name>Yahoo</org_name>  
    <email>dmarchelp@yahooinc.com</email>   
    <report_id>1681434617.68896</report_id> 
    <date_range>    
      <begin>1681344000</begin> 
      <end>1681430399</end> 
    </date_range>   
  </report_metadata>    
  <policy_published>    
    <domain>OURDOMAIN.COM</domain>  
    <adkim>r</adkim>    
    <aspf>r</aspf>  
    <p>none</p> 
    <pct>100</pct>  
  </policy_published>   
  <record>  
    <row>   
      <source_ip>192.0.0.1</source_ip>  
      <count>1</count>  
      <policy_evaluated>    
        <disposition>none</disposition> 
        <dkim>pass</dkim>   
        <spf>pass</spf> 
      </policy_evaluated>   
    </row>  
    <identifiers>   
      <header_from>OURDOMAIN.COM</header_from>  
    </identifiers>  
    <auth_results>  
      <dkim>    
        <domain>OURDOMAIN.COM</domain>  
        <selector>m3-4</selector>   
        <result>pass</result>   
      </dkim>   
      <spf> 
        <domain>OURDOMAIN.COM</domain>  
        <result>pass</result>   
      </spf>    
    </auth_results> 
  </record> 
</feedback> 

(domain and IP address changed)

Note the <org_name>Yahoo</org_name>:

Adding a check in parse_aggregate_report_xml to check for "." in org_name as well, should resolve this.