domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
962 stars 210 forks source link

Error Invalid archive extract_report() when unsing a path as input_ #475

Open kawaegle opened 4 months ago

kawaegle commented 4 months ago

I run into a strange error when I update the library parsedmarc to 8.7.0

file_patg = "/tmp/xml_path.xml"
parse_aggregate_report_file(file_path, offline=True, ip_db_path=None)

raise a

Invalid archive file: Not a valid zip, gzip, json, or xml file

after investigate it seem to be an error from https://github.com/domainaware/parsedmarc/blob/77132b3fc57358da6ea01fa9bb54034254c99c5e/parsedmarc/__init__.py#L564 here the file_object is not None but the check to open path https://github.com/domainaware/parsedmarc/blob/77132b3fc57358da6ea01fa9bb54034254c99c5e/parsedmarc/__init__.py#L570

simple hack is to set file_object to none before the pass

as test this is the fix I made for my project (as patch)

--- __init__.py 2024-02-22 16:58:11.395563315 +0100
+++ init_fix.py 2024-02-22 16:53:27.555565639 +0100
@@ -566,6 +566,7 @@
             try:
                 file_object = BytesIO(b64decode(input_))
             except binascii.Error:
+                file_object = None
                 pass
             if file_object is None:
                 file_object = open(input_, "rb")
@@ -588,6 +589,7 @@
             report = file_object.read().decode(errors='ignore')
         else:
             file_object.close()
+            print(f"header: {header} XML: {MAGIC_XML}")
             raise ParserError("Not a valid zip, gzip, json, or xml file")

         file_object.close()

Hope it help

kawaegle commented 4 months ago

as proof

I add print(f"==={file_object}===") in order to see if fileobject is really at None I get ===<_io.BytesIO object at 0x7efccc7a7560>=== So yes the mistake is realy just because of the variable file_object is create as a BytesIO