domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
962 stars 209 forks source link

Running parsedmarc with pypy causes "Too many open files" exception when reading gziped reports from a mailbox #321

Open seanthegeek opened 2 years ago

seanthegeek commented 2 years ago

While running parsedmarc in a pypy3.9-7.3.9 virtualenv on Rocky Linux 8.4, a "Too many open files" exception occurs when attempting to parse gripped DMARC aggregate reports retrieved from a Microsoft 365 mailbox via Microsoft Graph. This does not occur in a standard CPython virtualenv.

Install procedure

sudo dnf install libxml2-devel libxslt-devel python3-devel

wget https://downloads.python.org/pypy/pypy3.9-v7.3.9-linux64.tar.bz2
tar -pxf pypy3.9-v.7.3.9-linux64.tar.bz2 
mv pypy3.9-v7.3.9-linux64 pypy3

# virtualenv needs to be installed this way because the version of virtualenv included in RHEL/CentOS/Rocky Linux repositories fails to create a pypy virtualenv 
./pypy3/bin/pip3 install -U pip setuptools wheel virtualenv
sudo chown -R root:root pypy3
sudo mv pypy3 /opt
sudo ln -s /opt/pypy3/bin/pypy3 /usr/local/bin/pypy3

sudo -u parsedmarc /usr/local/bin/pypy3 -m venv --upgrade /opt/parsedmarc/venv
sudo -u parsedmarc /usr/local/bin/pip install -U parsedmarc

Log output

WARNING:init.py:1121:Message with subject "Report-ID: REDACTED" is not a valid aggregate DMARC report: Unexpected error: [Errno 24] Too many open files

seanthegeek commented 2 years ago

@nathanthorpe Can you look into this a bit? If it's not a bug in pypy itself, I'd like to get this fixed in the same release as your #320 PR.

nathanthorpe commented 2 years ago

I'm not able to reproduce this with the same instance of pypy running on Ubuntu 20.04, but I could just have a smaller gzip file that doesn't trigger that.

seanthegeek commented 2 years ago

This just got weirder. I was able to reproduce this bug again, but this time I tried moving the exact same emails from the invalid folder back to the inbox without doing anything else, and all 200+ emails processed correctly.

seanthegeek commented 2 years ago

Actually, I have my numbers wrong. Still happening. This is why I shouldn't do debugging in the middle of the night. 😅 Oh well, at least I have some samples that consistently fail now.