domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
961 stars 209 forks source link

parsedmarc.cli hanging with large report #507

Closed rhykw closed 1 month ago

rhykw commented 3 months ago

parsedmarc.cli hanging with large report. (sample is here)

output at press Ctrl+c

root@0f2a11061a15:/src/github.com/domainaware/parsedmarc# python3 parsedmarc/cli.py -c parsedmarc.ini -s -t 1  -o outdir test.xml
  0%|                                                                                                                                                            | 0/1 [00:00<?, ?it/s]

^CProcess Process-1:
Traceback (most recent call last):
  File "/src/github.com/domainaware/parsedmarc/parsedmarc/cli.py", line 1374, in <module>
Traceback (most recent call last):
    _main()
  File "/src/github.com/domainaware/parsedmarc/parsedmarc/cli.py", line 1202, in _main
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/src/github.com/domainaware/parsedmarc/parsedmarc/cli.py", line 61, in cli_parse
    conn.send([file_results, file_path])
  File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 405, in _send_bytes
    self._send(buf)
  File "/usr/local/lib/python3.9/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
KeyboardInterrupt
    proc.join()
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/usr/local/lib/python3.9/multiprocessing/popen_fork.py", line 43, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/usr/local/lib/python3.9/multiprocessing/popen_fork.py", line 27, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
  0%|                                                                                                                                                            | 0/1 [00:05<?, ?it/s]

root@0f2a11061a15:/src/github.com/domainaware/parsedmarc#

This problem seems to occur when the size of file_results object exceeds the size of net.core.wmem_default .

additional information

I ran strace to python process.

strace: Process 49264 attached
16:47:21.125152 write(6, "\r12.20.126.127\224h0\214\2US\224h2Nh3Nh4Nh"..., 139430

environment

docker on ubuntu 22.04.4 amd64 kernel 5.15.0-101-generic
image python:3.9-slim
Python 3.9.19

parsedmarc.ini

[general]
save_aggregate = True
always_use_local_files = True
local_reverse_dns_map_path = ./parsedmarc/resources/maps/base_reverse_dns_map.csv
offline = True
n_procs = 2
rhykw commented 3 months ago

Workaround

Set large value to net.core.wmem_default parameter.

ex:
sysctl -w net.core.wmem_default=1824214

Another workaround

https://github.com/domainaware/parsedmarc/commit/c8478d25b6728c7017da11273f2a84752ca43fee

rhykw commented 2 months ago

Has anyone else encountered this issue?

alutkevich commented 2 months ago

Yes, I've seen it happen. Increasing the net.core.wmem_default has stopped it from hanging on the Ubuntu system I'm running it on. When I was running it in a Windows environment it would hang consistently, forcing me to switch to running it under Ubuntu.

rhykw commented 2 months ago

Hanging when running in a Windows environment may also be resolved by applying this pull request.