ddbnl / office365-audit-log-collector

Collect / retrieve Office365, AzureAD and DLP audit logs and output to PRTG, Azure Log Analytics Workspace, SQL, Graylog, Fluentd, and/or file output.
https://ddbnl.github.io/office365-audit-log-collector/
MIT License
105 stars 40 forks source link

Data stored in SQL is getting deleted #33

Open nikhiltalati opened 2 years ago

nikhiltalati commented 2 years ago

Hello,

I have saved the output to SQL. But i see that data is getting deletd from the tables. Also no last_run file is getting created under windows. I have schedueld task to run the script every hour.

Thanks

nikhiltalati commented 2 years ago

Hello,

Now facing this issue Starting run @ 2022-08-23 14:38:48.625658. Content: deque(['Audit.General', 'Audit.AzureActiveDirectory', 'Audit.Exchange', 'Audit.SharePoint', 'DLP.All']). Traceback (most recent call last): File "AuditLogCollector.py", line 712, in File "AuditLogCollector.py", line 71, in run File "AuditLogCollector.py", line 84, in run_once File "AuditLogCollector.py", line 125, in receive_results_from_rust_engine File "AuditLogCollector.py", line 448, in _handle_retrieved_content TypeError: string indices must be integers [6256] Failed to execute script 'AuditLogCollector' due to unhandled exception! thread '' panicked at 'called Result::unwrap() on an Err value: SendError { .. }', src\api_connection.rs:234:57 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

ddbnl commented 2 years ago

I'll take a look at this issue, seems to be somewhere in the Rust engine. Could you post your config file so I can try to reproduce this?

Also just to confirm, are you using the latest release?

nikhiltalati commented 2 years ago

Yes I am uisng latest releawe. The config file is as below

log: # Log settings. Debug will severely decrease performance
path: 'collector.log'
debug: True
collect: # Settings determining which audit logs to collect and how to do it
workingDir: ./ # Directory to save cache files in (known_logs, known_content, last_run). Default is dir where executable is l ocated
contentTypes:
Audit.General: True
Audit.AzureActiveDirectory: False
Audit.Exchange: False
Audit.SharePoint: True
DLP.All: False
rustEngine: True # Use False to revert to the old Python engine. If running from python instead of executable, make sure to install the Rust enginej python wheel in the RustEngineWheels folder

schedule: 0 1 0 # How often to run in days/hours/minutes. Program will never exit and run on the schedule. Uncomment to use.

maxThreads: 50 # Maximum number of simultaneous threads retrieving logs
globalTimeout: 59 # Number of minutes before the process is forced to exit if still running (0 = no timeout). If you run e.g . every hour you could set this to 59, ensuring there will only be 1 active process.
retries: 3 # Times to retry retrieving a content blob if it fails
retryCooldown: 3 # Seconds to wait before retrying retrieving a content blob
autoSubscribe: True # Automatically subscribe to collected content types. Never unsubscribes from anything.
skipKnownLogs: True # Remember retrieved log ID's, don't collect them twice
resume: False # DEPRECATED, recommended to keep 'false'. Remember last run time, resume collecting from there next run
hoursToCollect: 72 # Look back this many hours for audit logs (can be overwritten by resume)
filter: # Only logs that match ALL filters for a content type are collected. Leave empty to collect all
Audit.General:
Audit.AzureActiveDirectory:
Audit.Exchange:
Audit.SharePoint:
DLP.All:
output:
file: # CSV output
enabled: False
separateByContentType: True # Creates a separate CSV file for each content type, using file name from 'path' as a prefix
path: 'output.csv'
separator: ';'
cacheSize: 500000 # Amount of logs to cache until each CSV commit, larger=faster but eats more memory
azureLogAnalytics:
enabled: False
workspaceId:
sharedKey:
maxThreads: 50 # Maximum simultaneous threads sending logs to workspace
azureTable: # Provide connection string to executable at runtime with --table-string
enabled: False
tableName: AuditLogs # Name of the table inside the storage account
maxThreads: 10 # Maximum simultaneous threads sending logs to Table
azureBlob: # Write CSV to a blob container. Provide connection string to executable at runtime with --blob-string
enabled: False
containerName: AuditLogs # Name of the container inside storage account
blobName: AuditLog # When separatedByContentType is true, this is used as file prefix and becomes e.g. AuditLog_AuditExcha nge.csv
tempPath: './output'
separateByContentType: True
separator: ';'
cacheSize: 500000 # Amount of logs to cache until each CSV commit, larger=faster but eats more memory
sql: # Provide connection string to executable at runtime with --sql-string
enabled: True
cacheSize: 500000 # Amount of logs to cache until each SQL commit, larger=faster but eats more memory
chunkSize: 500 # Amount of rows to write simultaneously to SQL, in most cases just set it as high as your DB allows. COUNT errors = too high
graylog:
enabled: False
address:
port:
prtg:
enabled: False
channels:
fluentd:
enabled: False
tenantName:
address:
port:

Ther error is now as below with Debug

Starting new HTTPS connection (1): login.microsoftonline.com:443
https://login.microsoftonline.com:443 "POST /xxxxxxxxxxxxxxxxxx/oauth2/token HTTP/1.1" 200 1510
Logged in
Starting new HTTPS connection (1): manage.office.com:443
https://manage.office.com:443 "GET /api/v1.0/xxxxxxxxxxxxxxxxxxxxxxxxx/activity/feed/subscriptions/list HTTP/1.1" 20 0 342
Starting run @ 2022-08-25 13:07:38.330209. Content: deque(['Audit.General', 'Audit.SharePoint']).
Exception in thread Thread-4:
Traceback (most recent call last):
File "threading.py", line 932, in _bootstrap_inner
File "threading.py", line 870, in run
File "Interfaces/SqlInterface.py", line 198, in _process_cache
File "pandas/core/frame.py", line 721, in init
File "pandas/core/internals/construction.py", line 519, in nested_data_to_arrays
File "pandas/core/internals/construction.py", line 875, in to_arrays
File "pandas/core/internals/construction.py", line 960, in _list_of_dict_to_arrays
File "pandas/_libs/lib.pyx", line 403, in pandas._libs.lib.fast_unique_multiple_list_gen
File "pandas/core/internals/construction.py", line 958, in
RuntimeError: deque mutated during iteration
Interfaces/SqlInterface.py:101: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-ve rsus-a-copy
Traceback (most recent call last):
File "AuditLogCollector.py", line 712, in
File "AuditLogCollector.py", line 71, in run
File "AuditLogCollector.py", line 84, in run_once
File "AuditLogCollector.py", line 125, in receive_results_from_rust_engine
File "AuditLogCollector.py", line 448, in _handle_retrieved_content
TypeError: string indices must be integers
[1755938] Failed to execute script 'AuditLogCollector' due to unhandled exception!
thread '' panicked at 'called Result::unwrap() on an Err value: SendError { .. }', src/api_connection.rs:254:57
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

ddbnl commented 2 years ago

I've located and fixed the crashing issue in the rust engine.

I'll set up a sql db this weekend to try and reproduce that issue. Are you working with an azure sql instance or running your own server?

nikhiltalati commented 2 years ago

I have my own server

nikhiltalati commented 2 years ago

Hello,

Any update when you will release patched version.

Thanks