domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
987 stars 214 forks source link

Not showing any data in Kibana dashboard after ~20k messages processed #121

Closed tlazaris closed 4 years ago

tlazaris commented 4 years ago

Running on Ubuntu. Followed guide, elastisearch and kibana are running, parsedmarc is running, it's gotten through about 20k messages, I have silent = false and debug = true so I can see what it's doing in the terminal window.

I can't find any files that appear to be results from the dmarc emails it's processing, I'm just searching for literally any file created in the last day, and not being tiny (a few kb).

Dashboard is empty, no data, I set the timeframe back several years incase parsedmarc is reading older emails in the inbox first.

Sometimes I'm getting Errno 24 - too many open files, some files are saying they're not valid reports, but most of the messages appear to be processing just fine with no errors.

This is my .ini


# This is an example comment

[general]
save_aggregate = True
save_forensic = True
silent = False
debug = True
n_procs = 2

[imap]
host = outlook.office365.com
user = serviceaccount@company.com\sharedmailbox
password = **************
watch = True
test = False
skip_certificate_verification = True

[elasticsearch]
hosts = 127.0.0.1:9200
ssl = False

The email address dmarc reports are going to is set up as a shared mailbox, with an associated disabled account, so it can't be logged into. A service account was created, and granted access to that mailbox. It works in the username format shown above and successfully connects via imap from testing it outside of parsedmarc and appears to work fine with parsedmarc. However I figured I'd point out that it's connecting this way incase somehow this is part of what's causing my issues.

Any guidance/help greatly appreciated, please let me know if there's additional information you need to be able to point me in the right direction.

seanthegeek commented 4 years ago

Hmmm. That configuration should work. The data should get saved in Elasticsearch and show up in Kibana as soon as the script is done reading the report files.

The only different part of your config vs my production config is the n_procs option. Try removing n_procs and let me know if that works. You might have found a bug in multiprocessing, which I didn't write and don't usually use.

JSDA123 commented 4 years ago

I am also running on Ubuntu and have much fewer dmarc emails but also find no data in Kibana. I don't have the n_procs option set.

I noticed the emails were moved from Inbox to Archive/Invalid, which probably explains why there is no data in my case.

The invalid emails were dmarc aggregate reports from Google, I saw no examples of valid reports because I have very few reports to work with at this time, just Google.

seanthegeek commented 4 years ago

@JSDA123 That's odd. I've never seen reports from Google fail to parse. Can you provide a sample report so I can debug the issue?

JSDA123 commented 4 years ago

@seanthegeek sure, how do I do that? Do you need the xml or the email or what? Can I anonymize the details?

seanthegeek commented 4 years ago

@JSDA123 Just the XML file is fine. You can anonymize the XML by replacing the your domain name with example.com. You can attach the example file by dragging and dropping it into the comment field here.

tlazaris commented 4 years ago

Disabled n_procs over the weekend - This resolved the too many files open issue.

Messages that were experiencing the "too many files open" issue were flagged as invalid despite being valid, putting them back into the inbox causes them to process normally.

I think my main issue was that I'm starting with a 200k message backlog, and the script seemed to error and die occasionally without ever submitting any of the data.

If possible I think the best fix (for my use case) would be a way to tell it to occasionally pause and submit what it's already processed, so that it can chug through that backlog without losing a ton of data anytime there's a hiccup.

The script did eventually finish. Once it whittled the volume down to an amount it could handle it one go, it got through that, sent the data to elastisearch, and it shows up on the dashboard.

Now that the inbox is empty, it's working as intended and grabbing new single emails as they come in.

Any thoughts on a change that'd enable it to save/send data every like 100 messages or so? (And maybe something to prevent random errors from causing it to think the message is invalid?)

Also, what happens if I point the script at Archive/Invalid to get it to re-process those messages? (Or any other suggestions on how to get it to reprocess those messages that got marked invalid?)

JSDA123 commented 4 years ago

@seanthegeek Here is the anonymized xml, was in a zip attachment

`<?xml version="1.0" encoding="UTF-8" ?>

google.com noreply-dmarc-support@google.com https://support.google.com/a/answer/2466580 17430453885546495203 1572912000 1572998399 example.com r r

none

none 100
xxx.xxx.xxx.xxx 1 none pass pass example.com example.com pass mail example.com pass
`
seanthegeek commented 4 years ago

@tlazaris , @JSDA123 please try the latest release I published today, 6.7.0.

JSDA123 commented 4 years ago

I updated to 6.7.0 with sudo -H pip3 install -U parsedmarc

Unfortunately no improvement in my case. I tested by moving emails from Invalid back to Inbox and then started parsedmarc service. Emails were moved back to Invalid and of course still no data displayed in Kibana.

tlazaris commented 4 years ago

@seanthegeek

While updating I got an error 24 again, but this time it was from the update process, and it said something about being an OS error. So maybe that error is an OS issue where it's got too many files open locally?

To be safe I rebooted. I also did update to the latest version as you instructed. It completed successfully.

I have it pointed at a temp directory right now to reprocess some messages - there's 2300 messages in there right now. So far it's gotten through 800 without an issue.

(Not sure if you did something to improve the way it's processing or what but it seems like it's going smoother this time around)

tlazaris commented 4 years ago

RDCMan_2019-11-06_16-45-11 RDCMan_2019-11-06_16-47-03

So it regrettably errored out right at the finish line.

As shown in the second screenshot, when I restart it, it has lost all progress. 3 messages were moved because they were invalid, so the remaining 2289 are being processed, it would seem, from the start.

(I'm actually hoping to move chunks larger than 2k into this temp folder for re-processing... but I'm concerned that minor hiccups like shown above will make that unlikely to work)

tlazaris commented 4 years ago

RDCMan_2019-11-06_17-40-41

Sorry, this is a new one, but since it happened during the course of testing, figured I'd share it.

seanthegeek commented 4 years ago

@tlazaris Parsedmarc works in three steps

  1. Reads all mail in the designated folder (INBOX by default)
  2. Moves all processed mail into the archive
  3. Saves results into Elasticsearch

So if the IMAP connection is interrupted, the entire process stops. It looks like your IMAP server can't maintain the connection for that many emails for some reason.

So process smaller chunks like you mentioned, then once the all of the inital bulk is processed, parsedmarc can monitor messages as they come in

tlazaris commented 4 years ago

Any way to increase the timeout and disconnection tolerance? This is dialing into an o365 mailbox, and the internet connection the machine is on is stable without interruptions, so I'm imagining the blips are very small/short.

Either way, the responses and assistance have been greatly appreciated, I think even without any changes I'll be able to eventually feed the backlog through. :)

seanthegeek commented 4 years ago

Let me know how it goes. I'll keep this issue open until then.

michaeldavie commented 4 years ago

@JSDA123 That's odd. I've never seen reports from Google fail to parse. Can you provide a sample report so I can debug the issue?

I've seen a few aggregate reports from Google that failed to parse, and in almost every case it was caused by a Salesforce domain name with invalid characters that Google included in the report. These reports are what that led me to develop the workarounds that got merged into 6.7.0 (#122).

JSDA123 commented 4 years ago

@michaeldavie thanks for the information ... in my case I found the problem was geoip not working, causing all reports to be moved to Invalid. After resolving a geoipupdate issue, the reports have been parsing correctly. So it turns out my issue wasn't a parsing issue after all.