aholzel / TA-dmarc

Splunk app for the processing and ingestion of DMARC RUA reports
4 stars 2 forks source link

dmarc-parser.py doesn't reset record values on each record read #5

Closed fraserhess closed 3 years ago

fraserhess commented 4 years ago

My colleague @geekusa and I have discovered a bug in dmarc-parser.py where it does not reset record values after each record reading. The result is that records that do not have certain fields, such as DKIM selector, get that information populated from the last record to have a value for that field.

This problem occurs because in line 142 report_recorddata = report_defaultdata report_recorddata is not a new dictionary object but a reference to report_defaultdata. Therefore, all changes to report_recorddata also change report_defaultdata. Our solution was to import copy in the header and then rewrite 142 as report_recorddata = copy.deepcopy(report_defaultdata)

A deep copy appears to be necessary in order to copy all of the recursive dictionaries in report_defaultdata.

aholzel commented 4 years ago

Hello @fraserhess

Thanks for use the app, hope it helps you!

I don't see how this can happen I tested it with different values and it just gets the info from that row, can you provided a xml that has this problem so I can test it.

The line you say that has the problem in it is a line to make sure that after each row of info the report_recorddata variable is reset with the default info that applies to each line. the content of the report_defaultdata is the info between the <report_medata> tags and the <policy_published> tags:

  <report_metadata>
    <org_name>ORGINAZATION</org_name>
    <email>MAILER-DAEMON@ORGINAZATION.nl</email>
    <extra_contact_info>done call us</extra_contact_info>
    <report_id>1a8b$b2ceb=066556cc37f18@ORGANIZATION.nl</report_id>
    <date_range>
      <begin>1566943204</begin>
      <end>1567029604</end>
    </date_range>
  </report_metadata>
  <policy_published>
    <domain>recipient-domain.nl</domain>
    <adkim>r</adkim>
    <aspf>r</aspf>
    <p>quarantine</p>
    <sp></sp>
    <pct>100</pct>
  </policy_published>

After the above the info about the sending systems and the actions they took is in the XML and that is different for each <record> :

<record>
    <row>
      <source_ip>12.34.56.78</source_ip>
      <count>1</count>
      <policy_evaluated>
        <disposition>none</disposition>
        <dkim>fail</dkim>
        <spf>pass</spf>
      </policy_evaluated>
    </row>
    <identifiers>
      <header_from>recipient-domain.nl</header_from>
      <envelope_from>recipient-domain.nl</envelope_from>
    </identifiers>
    <auth_results>
      <spf>
        <domain>recipient-domain.nl</domain>
        <scope>mfrom</scope>
        <result>pass</result>
      </spf>
    </auth_results>
  </record>
fraserhess commented 4 years ago

I do have a file that demonstrates the problem. It is attached and is an actual DMARC report from Google truncated to 4 records for testing purposes. The first two records (1 and 2) are DMARC passes and have DKIM auth results including a selector. The second two records (3 and 4) are DMARC quarantines and do not have DKIM auth results in the XML but they do in the json output in Splunk. In fact, they have the DKIM auth results from record 2.

google.com!pinnacol.com!1598140800!1598227199-trim.xml.zip

fraserhess commented 4 years ago

For more information see example 2 on this page: https://www.programiz.com/python-programming/methods/dictionary/copy

When you copy by assignment (=) changes are made to both the copy and the original. Using the (shallow) copy function also isn't sufficient because the data structure here has dictionaries inside other dictionaries. A deep copy is required.

aholzel commented 4 years ago

cool thanks I didn't know and didn't run into that problem before (or I just didn't see it....) I will look over the code and see where I need to change things. I will probably have a new version tomorrow.. Thanks for the info and the new insight

aholzel commented 4 years ago

I created a new release (v3.7.0) with the suggested fix, thanks again for that. I also found another problem by accident when I processed the provided zip file. You apparently created that on a Mac and there was a "__MACOS" folder inside the zip, that caused an exception.