domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
995 stars 213 forks source link

Invalid reports from mimecastreport.com due to trailing \r\n on gzip data #429

Closed 0xabu closed 12 months ago

0xabu commented 1 year ago

I've recently received some reports from Mimecast that parsedmarc flags as invalid. The root cause appears to be that they have an attachment, where the valid gzipped report data is followed by a spurious CRLF. This causes Python's GzipFile to reject them here:

File .../python3.9/site-packages/parsedmarc/__init__.py:392, in extract_xml(input_)
    391 elif header.startswith(MAGIC_GZIP):
--> 392     xml = GzipFile(fileobj=file_object).read().decode(errors='ignore')
    393 elif header.startswith(MAGIC_XML):

File .../lib/python3.9/gzip.py:300, in GzipFile.read(self, size)
    299     raise OSError(errno.EBADF, "read() on write-only GzipFile object")
--> 300 return self._buffer.read(size)

File .../lib/python3.9/gzip.py:487, in _GzipReader.read(self, size)
    486 self._init_read()
--> 487 if not self._read_gzip_header():
    488     self._size = self._pos

File .../lib/python3.9/gzip.py:435, in _GzipReader._read_gzip_header(self)
    434 if magic != b'\037\213':
--> 435     raise BadGzipFile('Not a gzipped file (%r)' % magic)
    437 (method, flag,
    438  self._last_mtime) = struct.unpack("<BBIxx", self._read_exact(8))

BadGzipFile: Not a gzipped file (b'\r\n')

However, the command-line gzip tools extract them without issue, and presumably other parsers handle these too.

I found some discussion of the limitations of GzipFile here -- maybe one of these workarounds would make sense for parsedmarc?

These reports came from:

<org_name>Mimecast</org_name>
<email>no-reply@au-1.mimecastreport.com</email>
<extra_contact_info>https://community.mimecast.com/s/knowledge</extra_contact_info>
Kuzuto commented 1 year ago

Made a pull request #430 , that should fix the problem.

nhairs commented 9 months ago

It may be worth adding tests for this if you're willing to provide a sample email @0xabu or by generating our own spurious sample.

0xabu commented 9 months ago

Sure: mimecast-bad-dmarc.eml