domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
995 stars 213 forks source link

Elasticsearch 8.7.1 - failed to parse field [must] #419

Closed michael-markevich closed 1 year ago

michael-markevich commented 1 year ago

I've installed the most recend parsedmarc script and Elasticsearch 8.7.1, and have the following issue:

Jun 13 11:16:29 myserver parsedmarc[351265]: INFO:elastic.py:295:Saving aggregate report to Elasticsearch ... Jun 13 11:16:29 myserver parsedmarc[351265]: elasticsearch.exceptions.RequestError: RequestError(400, 'x_content_parse_exception', '[1:59] [bool] failed to parse field [must]')

error.txt

What could be a fix?

danyelex commented 1 year ago

same error..... :(

but for 4 days, all works fine. It's only from yesterday.....

amaiman commented 1 year ago

I ran into this, as well, today. ~I think I was able to fix it; running another test now to make sure. Appears to be if org_name is 'None' it fails. If my fix (basically don't use org_name in the query if it's null) works I'll add a pull request.~

amaiman commented 1 year ago

~Ok, pull request submitted; this change fixed it for me.~

~Issue was with this:~

  File "/opt/parsedmarc/venv/lib/python3.10/site-packages/parsedmarc/elastic.py", line 332, in save_aggregate_report_to_elasticsearch
    existing = search.execute()

~I tested it by temporarily adding a line to output the search query to try to find where it was getting stuck and it was on the 'None' for org_name:~

    INFO:elastic.py:295:Saving aggregate report to Elasticsearch
    INFO:elastic.py:330:Bool(must=[MatchPhrase(org_name='Yahoo! Inc.'), MatchPhrase(report_id='1541639311.437651'), MatchPhrase(published_policy__domain='maiman.net'), Match(date_begin=datetime.datetime(2018, 11, 7, 0, 0, tzinfo=datetime.timezone.utc)), Match(date_end=datetime.datetime(2018, 11, 7, 23, 59, 59, tzinfo=datetime.timezone.utc))])
 WARNING:cli.py:100:An aggregate report ID 1541639311.437651 from Yahoo! Inc. about maiman.net with a date range of 2018-11-07 00:00:00Z UTC to 2018-11-07 23:59:59Z UTC already exists in Elasticsearch
    INFO:elastic.py:295:Saving aggregate report to Elasticsearch
    INFO:elastic.py:330:Bool(must=[MatchPhrase(org_name=None), MatchPhrase(report_id='maiman.net_1510963200'), MatchPhrase(published_policy__domain='maiman.net'), Match(date_begin=datetime.datetime(2017, 11, 17, 0, 0, tzinfo=datetime.timezone.utc)), Match(date_end=datetime.datetime(2017, 11, 18, 0, 0, tzinfo=datetime.timezone.utc))])

~This was just a quick fix. I'm not familiar with the whole codebase of parsedmarc so this may just be a band-aid on a different underlying issue. I went back and looked at the .XML file for the one that it failed with an org_name of 'None' and there does appear to be a valid org_name in the file, so I'm not entirely sure why it ended up as 'None'. Maybe someone who has time to take a deeper look at the code can figure it out.~

amaiman commented 1 year ago

Never mind, I took a closer look this morning and found the https://github.com/domainaware/parsedmarc/blob/master/CHANGELOG.md shows that this issue was already fixed (correctly, not in the hacky way that I tried yesterday) on https://github.com/domainaware/parsedmarc/issues/410

PyPi doesn't have version 8.6.1 which is why I wasn't seeing the fixed version. Running this will update to the GitHub version and then it works properly: sudo -u parsedmarc /opt/parsedmarc/venv/bin/pip install -U git+https://github.com/domainaware/parsedmarc.git

michael-markevich commented 1 year ago

Confirmed that the fix is working!