domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
967 stars 210 forks source link

parallel processing #147

Open bondr007 opened 4 years ago

bondr007 commented 4 years ago

I am having issues getting parallel processing to work. I have the following in my parsedecmarc.ini file:

[general]
save_aggregate = True
save_forensic = True
chunk_size = 10
n_procs = 8
#debug = True

[elasticsearch]
hosts = localhost:9200
ssl = False

When I run it with debug turned on it does process the messages, one at a time, sadly I have over 20K, which would take quite some time. When debug is not enabled it just sits there doing nothing. With the following output

0it [00:00, ?it/s]

Any ideas?

Thanks!

zscholl commented 4 years ago

Have you tried running it on a smaller subset of files? It could be stuck processing and not updating the counters so the progress bar is stuck.

bondr007 commented 4 years ago

Looks like the parallel processing is only used if the script is passed a directory of .eml files. I was giving it an .mbox file, containing thousands of emails. Uses a tone of ram since it does not send anything to elastisearch until it is complete.

axelocz commented 2 years ago

It doesn't seem like this was actually resolved. The parallel processing still does not seem to work for messages pulled using IMAP. Is that intentional (should be documented) or is it an oversight?

Taoquitok commented 2 years ago

I'm getting the same behaviour as @axelocz on this. When pulling messages using IMAP it's only processing them 1 at a time