ddbnl / office365-audit-log-collector

Collect / retrieve Office365, AzureAD and DLP audit logs and output to PRTG, Azure Log Analytics Workspace, SQL, Graylog, Fluentd, and/or file output.
https://ddbnl.github.io/office365-audit-log-collector/
MIT License
106 stars 40 forks source link

No new logs since first collection / initial run #26

Closed flotpg closed 2 years ago

flotpg commented 2 years ago

Dear @ddbnl

I noticed that since the initial collection run, I don't get any new logs.

Making API request using URL: "https://manage.office.com/api/v1.0/TENANT-ID/activity/feed/subscriptions/list"
Starting new HTTPS connection (1): login.microsoftonline.com:443
https://login.microsoftonline.com:443 "POST /TENANT-ID/oauth2/token HTTP/1.1" 200 1510
Logged in
Starting new HTTPS connection (1): manage.office.com:443
https://manage.office.com:443 "GET /api/v1.0/TENANT-ID/activity/feed/subscriptions/list HTTP/1.1" 200 342
Starting run @ 2022-05-03 09:39:11.194626. Content: deque(['Audit.General', 'Audit.AzureActiveDirectory', 'Audit.Exchange', 'Audit.SharePoint', 'DLP.All']).
Audit.General - resuming from: 2022-05-03 07:30:03+00:00
Getting available content for type: "Audit.General"
Audit.AzureActiveDirectory - resuming from: 2022-05-03 07:30:03+00:00
Retrieving Audit.General. Start time: 2022-05-03T07:30:03+00:00. End time: 2022-05-03T07:39:11.
Getting available content for type: "Audit.AzureActiveDirectory"
Making API request using URL: "https://manage.office.com/api/v1.0/TENANT-ID/activity/feed/subscriptions/content?contentType=Audit.General&startTime=2022-05-03T07:30:03+00:00&endTime=2022-05-03T07:39:11"
Audit.Exchange - resuming from: 2022-05-03 07:30:03+00:00
Retrieving Audit.AzureActiveDirectory. Start time: 2022-05-03T07:30:03+00:00. End time: 2022-05-03T07:39:11.
Starting new HTTPS connection (1): login.microsoftonline.com:443
Getting available content for type: "Audit.Exchange"
Audit.SharePoint - resuming from: 2022-05-03 07:30:03+00:00
Making API request using URL: "https://manage.office.com/api/v1.0/TENANT-ID/activity/feed/subscriptions/content?contentType=Audit.AzureActiveDirectory&startTime=2022-05-03T07:30:03+00:00&endTime=2022-05-03T07:39:11"
Retrieving Audit.Exchange. Start time: 2022-05-03T07:30:03+00:00. End time: 2022-05-03T07:39:11.
Starting new HTTPS connection (1): login.microsoftonline.com:443
Getting available content for type: "Audit.SharePoint"
Making API request using URL: "https://manage.office.com/api/v1.0/TENANT-ID/activity/feed/subscriptions/content?contentType=Audit.Exchange&startTime=2022-05-03T07:30:03+00:00&endTime=2022-05-03T07:39:11"
DLP.All - resuming from: 2022-05-03 07:30:03+00:00
Retrieving Audit.SharePoint. Start time: 2022-05-03T07:30:03+00:00. End time: 2022-05-03T07:39:11.
Starting new HTTPS connection (1): login.microsoftonline.com:443
https://login.microsoftonline.com:443 "POST /TENANT-ID/oauth2/token HTTP/1.1" 200 1510
Getting available content for type: "DLP.All"
Making API request using URL: "https://manage.office.com/api/v1.0/TENANT-ID/activity/feed/subscriptions/content?contentType=Audit.SharePoint&startTime=2022-05-03T07:30:03+00:00&endTime=2022-05-03T07:39:11"
https://login.microsoftonline.com:443 "POST /TENANT-ID/oauth2/token HTTP/1.1" 200 1510
Logged in
Retrieving DLP.All. Start time: 2022-05-03T07:30:03+00:00. End time: 2022-05-03T07:39:11.
Starting new HTTPS connection (1): login.microsoftonline.com:443
Logged in
Starting new HTTPS connection (1): manage.office.com:443
Making API request using URL: "https://manage.office.com/api/v1.0/TENANT-ID/activity/feed/subscriptions/content?contentType=DLP.All&startTime=2022-05-03T07:30:03+00:00&endTime=2022-05-03T07:39:11&PublisherIdentifier=TENANT-ID"
https://login.microsoftonline.com:443 "POST /TENANT-ID/oauth2/token HTTP/1.1" 200 1510
Starting new HTTPS connection (1): manage.office.com:443
Starting new HTTPS connection (1): manage.office.com:443
Logged in
https://login.microsoftonline.com:443 "POST /TENANT-ID/oauth2/token HTTP/1.1" 200 1510
Starting new HTTPS connection (1): manage.office.com:443
Logged in
https://manage.office.com:443 "GET /api/v1.0/TENANT-ID/activity/feed/subscriptions/content?contentType=Audit.General&startTime=2022-05-03T07:30:03+00:00&endTime=2022-05-03T07:39:11 HTTP/1.1" 400 82
https://manage.office.com:443 "GET /api/v1.0/TENANT-ID/activity/feed/subscriptions/content?contentType=DLP.All&startTime=2022-05-03T07:30:03+00:00&endTime=2022-05-03T07:39:11&PublisherIdentifier=TENANT-ID HTTP/1.1" 400 82
Got 1 content blobs of type: "Audit.General"
Starting new HTTPS connection (1): manage.office.com:443
Got 1 content blobs of type: "DLP.All"
https://manage.office.com:443 "GET /api/v1.0/TENANT-ID/activity/feed/subscriptions/content?contentType=Audit.AzureActiveDirectory&startTime=2022-05-03T07:30:03+00:00&endTime=2022-05-03T07:39:11 HTTP/1.1" 400 82
Got 1 content blobs of type: "Audit.AzureActiveDirectory"
https://manage.office.com:443 "GET /api/v1.0/TENANT-ID/activity/feed/subscriptions/content?contentType=Audit.Exchange&startTime=2022-05-03T07:30:03+00:00&endTime=2022-05-03T07:39:11 HTTP/1.1" 400 82
Got 1 content blobs of type: "Audit.Exchange"
https://manage.office.com:443 "GET /api/v1.0/TENANT-ID/activity/feed/subscriptions/content?contentType=Audit.SharePoint&startTime=2022-05-03T07:30:03+00:00&endTime=2022-05-03T07:39:11 HTTP/1.1" 400 82
Got 1 content blobs of type: "Audit.SharePoint"
Finished. Total logs retrieved: 0. Total logs with errors: 0. Run time: 0:00:01.596500.
GraylogInterface reports: 0 successfully sent, 0 errors

My settings:

log:  # Log settings. Debug will severely decrease performance
  path: '/var/log/office365-audit-log-collector.log'
  debug: True
collect:  # Settings determining which audit logs to collect and how to do it
  contentTypes:
    Audit.General: True
    Audit.AzureActiveDirectory: True
    Audit.Exchange: True
    Audit.SharePoint: True
    DLP.All: True
  maxThreads: 50
  retries: 3  # Times to retry retrieving a content blob if it fails
  retryCooldown: 3  # Seconds to wait before retrying retrieving a content blob
  autoSubscribe: True  # Automatically subscribe to collected content types. Never unsubscribes from anything.
  skipKnownLogs: True  # Remember retrieved log ID's, don't collect them twice
  resume: True  # Remember last run time, resume collecting from there next run
  hoursToCollect: 24 # Look back this many hours for audit logs (can be overwritten by resume)
filter:  # Only logs that match ALL filters for a content type are collected. Leave empty to collect all
  Audit.General:
  Audit.AzureActiveDirectory:
  Audit.Exchange:
  Audit.SharePoint:
  DLP.All:
output:
  graylog:
    enabled: true
    address: 127.0.0.1
    port: 5555

last_run_times {"Audit.General": "2022-05-03T07:39:11Z", "Audit.AzureActiveDirectory": "2022-05-03T07:39:11Z", "Audit.Exchange": "2022-05-03T07:39:11Z", "Audit.SharePoint": "2022-05-03T07:39:11Z", "DLP.All": "2022-05-03T07:39:11Z"}%

known_content is empty... normal?

known_logs has content but not up2date (nothing from today or yesterday for example)

Any hints how to debug this?

First thing I will try is backup known_* & last_run_times files, remove them and run again.

Finished. Total logs retrieved: 15074. Total logs with errors: 0. Run time: 0:00:14.717247. GraylogInterface reports: 15074 successfully sent, 0 errors

This works and I get new logs.

ddbnl commented 2 years ago

I'll try to reproduce this with your settings. I suspect it has to do with the Resume setting, which I personally don't use in production. The reason is that I don't mind runs overlapping, because the 'skipKnownLogs' setting should prevent duplicates; so when runs overlap you can pick up anything that for whatever reason wasn't available before (maybe Microsoft was slow in updates the logs for example).

Unless you have a specific reason to use resume, could you try running with 'resume: False'? In the meantime I will see if there's a bug with it and try to fix it if so.

flotpg commented 2 years ago

Awesome, that works perfectly. I didn't set resume:True intentionally. I just copied ConfigExamples/fullConfig.yaml ;) Thanks a lot and have a great day.

ddbnl commented 2 years ago

No worries. I can see that Resume is working as intended. The problem is that the intention was wrong in the first place.

Resume retrieves everything until the current time initially using hoursToCollect (e.g. last 24 hours) and sets the last run time to the current time. Let's say the current time is 10:00 AM. In my test I decided to run every 5 minutes. Between 10:00AM and 10:05AM logs are generated (users are doing things), but Microsoft has probably not published all the logs yet since they're not instantly available, so 0 logs are retrieved. The last run time is saved to 10:05, so all the logs between 10:00 and 10:05 are skipped permanently. This repeats over and over again, never retrieving anything.

Could you let me know if your CRON schedule was relatively short, at least shorter than multiple hours? That would confirm my suspicion.

The resume parameter may work but only if it's not scheduled too often. I'm going to think about removing this parameter, perhaps the intention behind it was misguided. I'm definitely going to disable it as a default parameter.

Thanks for creating the issue and letting me know.

flotpg commented 2 years ago

My cron job runs every 15 minutes. Thanks for the detailed explanation!

Gill-Bates commented 2 years ago

This was the answere where I was looking for a long time.