duosecurity / duo_log_sync

MIT License
37 stars 28 forks source link

Log streaming #22

Open aclease opened 3 years ago

aclease commented 3 years ago

This is a fairly large deployment with about 26,000 active users, so I expect there to be a lot of data to catch up on given I have just started this process. I am running Python 3.8.8 on windows, and I started with an offset of 10 days, in the event something were to bug out on the windows box I should have 10 days to find it and start the DLS script again (at least that was my thinking). I started the DLS script last night, and it started pulling in logs from ten days ago, so I figured I would let it run overnight and catch up. I checked this morning and according to the DLS logs, it was pulling in 1-7 logs this morning. at each interval. Sweet so it caught up, or so I thought. I made a small change in the log file placement, and restarted the DLS script and I was surprised to see it pulling in 1000 logs at a time again. But then this is a larger implementation, so perhaps some mass user edits were happening, so I checked the logs that were being brought in, and come to find out it was bringing in new logs from ten days ago and working it's way back to current time.

I have checked a couple of the Auth messages and the timestamp in the Auth messages are not repeats, and I have not removed or edited the checkpoint files, so it is leaving me wondering if I need to continually restart the process say every 8 hours to ensure I am capturing all the logs from the DUO cloud or if I need to downgrade Python to 3.7 or 3.6, or change it over to a Linux box to ensure it is processing all the files effectively and is reliable in it's gathering of logs. I just happened to have a windows box extra laying around in my VM space, so figured I would use it.

As a side note, I was hoping that commenting out the duologsync portion of the config would cause the log file to not be written, but instead it just started a new file in the c:/tmp folder. So some suggestions on log rotation for non-windows specialists would probably be helpful or include something in the script that handles the rotation that the user can specify how big or how many files to keep.

aclease commented 3 years ago

Update: I watched the task complete a second time.

I had a very small number of logs duplicated. I wanted to see if the script would actually pick up where it had left off and if it would again attempt to pull in the same set of logs for the last set of 10 days.

I restarted the DLS script, and it's back to 10 days ago, and it's duplicating some log entries as well as pulling in unique log entries.

I may have to try this on a linux box running 3.7 python in the event this is completely something inside windows, or 3.8.8.

aclease commented 3 years ago

I moved the script to a linux box running python 3.8 and put a freeware syslog server on a local machine, just to weed out any potential for drops or network issues. It appeared to work pretty well, as only 2 logs were different between the web admin console, and the log count. So I might have a windows problem, or the syslog server I was attempting to send the logs to may have been a little overworked. I will have to test from a linux box on prem and see if it is a problem with the windows box or the syslog server.

One thing I would like to point out in my testing: The windows python script complained about the checkpointing piece of the configuration when I attempted to use it to point the files elsewhere. Letting it go to default meant I had to create a C:/tmp folder, but that is fine. On the linux box I left it as default, so it is writting to the /tmp folder.

What I have noticed is stopping the duologsync, the checkpoint files are ignored and the offset starts over. meaning in an offset of last 1 day, I had to pull all the auth and adminactions again even through I already had them. I was hoping the system would handle it more gracefully in the event a system reboot was needed, or the script neede dto be restarted for some reason or another. Perhaps a feature request, or an unintended bug in the code?