domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
986 stars 214 forks source link

Processing of accumulated reports is intransparent and a bit error-prone #299

Closed volter closed 6 months ago

volter commented 2 years ago

We had issues with our installation of parsedmarc and didn't bother to fix it for a long time, so a huge lot of reports had accumulated in the mailbox it is configured to process.

After we had parsedmarc up and running again, it started processing, but seemingly created no output. Some strace-ing revealed that it only seems to go to Elasticsearch once all reports are processed, which might take hours and I'm not sure if it could result in a request to ES that could be problematic. I would recommend to introduce some batching instead of waiting until the very end.

What is worse, if you stop parsedmarc, nothing is written to ES, but the reports have been moved to the IMAP folder for reports already processed. I think some signal handling is missing here.

davidande commented 2 years ago

I had the same thing happend with 3500 reports in my inbox. You have the choice to wait till the integration process finish (and it is very long or, you can add batch_size = 10 in the IMAP section of your parsedmarc.ini; it will so parse mails 10 by 10. check sudo parsedmarc -debug /etc/parsedmarc.ini to check the proccess

volter commented 2 years ago

Ah, OK, that's good to know, thanks. The whole thing is a bit of a corner case, of course. Maybe that could be a sensible default setting.

davidande commented 2 years ago

by the way, if you have started the integration process witout batch_size, just wait for the proccess to end. in my case it tooks hours!!!

volter commented 2 years ago

Yes, sure, you could do that, but to an unknowing new user it looks like it's just not working and if they cancel it, having no data seems to confirm that.

davidande commented 2 years ago

yes and if you cancel or reboot, the whole process of integration start again from scratch

volter commented 2 years ago

That's not exactly the behaviour that I think to have witnessed, as parsed (but not saved) messages were already moved to a folder, so that I think they would not be considered again.

taddev commented 2 years ago

I don't know how, but this needs to be more clearly defined. We just spent the last 4 days trying to figure out why our reports are not processing, because we have over 8k in the mailbox right now.

bendem commented 1 year ago

That's not exactly the behaviour that I think to have witnessed, as parsed (but not saved) messages were already moved to a folder, so that I think they would not be considered again.

This is #242