Closed tlazaris closed 4 years ago
Hmmm. That configuration should work. The data should get saved in Elasticsearch and show up in Kibana as soon as the script is done reading the report files.
The only different part of your config vs my production config is the n_procs
option. Try removing n_procs
and let me know if that works. You might have found a bug in multiprocessing, which I didn't write and don't usually use.
I am also running on Ubuntu and have much fewer dmarc emails but also find no data in Kibana. I don't have the n_procs option set.
I noticed the emails were moved from Inbox to Archive/Invalid, which probably explains why there is no data in my case.
The invalid emails were dmarc aggregate reports from Google, I saw no examples of valid reports because I have very few reports to work with at this time, just Google.
@JSDA123 That's odd. I've never seen reports from Google fail to parse. Can you provide a sample report so I can debug the issue?
@seanthegeek sure, how do I do that? Do you need the xml or the email or what? Can I anonymize the details?
@JSDA123 Just the XML file is fine. You can anonymize the XML by replacing the your domain name with example.com. You can attach the example file by dragging and dropping it into the comment field here.
Disabled n_procs over the weekend - This resolved the too many files open issue.
Messages that were experiencing the "too many files open" issue were flagged as invalid despite being valid, putting them back into the inbox causes them to process normally.
I think my main issue was that I'm starting with a 200k message backlog, and the script seemed to error and die occasionally without ever submitting any of the data.
If possible I think the best fix (for my use case) would be a way to tell it to occasionally pause and submit what it's already processed, so that it can chug through that backlog without losing a ton of data anytime there's a hiccup.
The script did eventually finish. Once it whittled the volume down to an amount it could handle it one go, it got through that, sent the data to elastisearch, and it shows up on the dashboard.
Now that the inbox is empty, it's working as intended and grabbing new single emails as they come in.
Any thoughts on a change that'd enable it to save/send data every like 100 messages or so? (And maybe something to prevent random errors from causing it to think the message is invalid?)
Also, what happens if I point the script at Archive/Invalid to get it to re-process those messages? (Or any other suggestions on how to get it to reprocess those messages that got marked invalid?)
@seanthegeek Here is the anonymized xml, was in a zip attachment
`<?xml version="1.0" encoding="UTF-8" ?>
none
@tlazaris , @JSDA123 please try the latest release I published today, 6.7.0.
I updated to 6.7.0 with sudo -H pip3 install -U parsedmarc
Unfortunately no improvement in my case. I tested by moving emails from Invalid back to Inbox and then started parsedmarc service. Emails were moved back to Invalid and of course still no data displayed in Kibana.
@seanthegeek
While updating I got an error 24 again, but this time it was from the update process, and it said something about being an OS error. So maybe that error is an OS issue where it's got too many files open locally?
To be safe I rebooted. I also did update to the latest version as you instructed. It completed successfully.
I have it pointed at a temp directory right now to reprocess some messages - there's 2300 messages in there right now. So far it's gotten through 800 without an issue.
(Not sure if you did something to improve the way it's processing or what but it seems like it's going smoother this time around)
So it regrettably errored out right at the finish line.
As shown in the second screenshot, when I restart it, it has lost all progress. 3 messages were moved because they were invalid, so the remaining 2289 are being processed, it would seem, from the start.
(I'm actually hoping to move chunks larger than 2k into this temp folder for re-processing... but I'm concerned that minor hiccups like shown above will make that unlikely to work)
Sorry, this is a new one, but since it happened during the course of testing, figured I'd share it.
@tlazaris Parsedmarc works in three steps
INBOX
by default)So if the IMAP connection is interrupted, the entire process stops. It looks like your IMAP server can't maintain the connection for that many emails for some reason.
So process smaller chunks like you mentioned, then once the all of the inital bulk is processed, parsedmarc can monitor messages as they come in
Any way to increase the timeout and disconnection tolerance? This is dialing into an o365 mailbox, and the internet connection the machine is on is stable without interruptions, so I'm imagining the blips are very small/short.
Either way, the responses and assistance have been greatly appreciated, I think even without any changes I'll be able to eventually feed the backlog through. :)
Let me know how it goes. I'll keep this issue open until then.
@JSDA123 That's odd. I've never seen reports from Google fail to parse. Can you provide a sample report so I can debug the issue?
I've seen a few aggregate reports from Google that failed to parse, and in almost every case it was caused by a Salesforce domain name with invalid characters that Google included in the report. These reports are what that led me to develop the workarounds that got merged into 6.7.0 (#122).
@michaeldavie thanks for the information ... in my case I found the problem was geoip not working, causing all reports to be moved to Invalid. After resolving a geoipupdate issue, the reports have been parsing correctly. So it turns out my issue wasn't a parsing issue after all.
Running on Ubuntu. Followed guide, elastisearch and kibana are running, parsedmarc is running, it's gotten through about 20k messages, I have silent = false and debug = true so I can see what it's doing in the terminal window.
I can't find any files that appear to be results from the dmarc emails it's processing, I'm just searching for literally any file created in the last day, and not being tiny (a few kb).
Dashboard is empty, no data, I set the timeframe back several years incase parsedmarc is reading older emails in the inbox first.
Sometimes I'm getting Errno 24 - too many open files, some files are saying they're not valid reports, but most of the messages appear to be processing just fine with no errors.
This is my .ini
The email address dmarc reports are going to is set up as a shared mailbox, with an associated disabled account, so it can't be logged into. A service account was created, and granted access to that mailbox. It works in the username format shown above and successfully connects via imap from testing it outside of parsedmarc and appears to work fine with parsedmarc. However I figured I'd point out that it's connecting this way incase somehow this is part of what's causing my issues.
Any guidance/help greatly appreciated, please let me know if there's additional information you need to be able to point me in the right direction.