Icinga / icingaweb2-module-vspheredb

The easiest way to monitor a VMware vSphere environment.
https://icinga.com/docs/vsphere/latest
GNU General Public License v2.0
100 stars 34 forks source link

Event sync stuck on specific day #517

Open xeiss opened 1 year ago

xeiss commented 1 year ago

Expected Behavior

Event sync can sync all events

Current Behavior

The event sync stuck on a specific day, 10.01.2023 is the last day with events, in our Database: grafik

Daemon Log shows every 2 sec: "Task 'Events' is already running, skipping" Daemon Debug Log Output, don't show anything other about Events. "service icinga-vspheredb restart" don't change anything, after 4 sec it comes back to "Task 'Events' is already running, skipping"

I also flush the whole icinga-vspheredb database and start with a fresh setup, same last event / day.

Possible Solution

Ignore a event when, it isn't possible to apply it to database and print the error on daemon log. So we also be able to identify the problematic event and other events can be applied.

Steps to Reproduce (for bugs)

I can reproduce it with a fresh start of database + config of my vcenter. It only sync 1168 events. But I think this is only reproducible with my vcenter. I also exported the events with VMware PowerCLI and look on the after last event from vspheredb history, but there was no strange event, some "Task: Create virtual machine snapshot" or "Task: Find rules for VM" or "Changed custom field". But this events happends every day.

Your Environment

skupjoe commented 8 months ago

@xeiss did you ever find a solution for this? I am also stuck in a Task 'Events' is already running, skipping loop and restarting the agent or DB didn't fix it.

xeiss commented 8 months ago

@skupjoe no sorry. The module has much potential, but no commit since 6 months and I also can't use it for a real thing.

skupjoe commented 8 months ago

Hi @xeiss thanks for the response back. I was actually successfully able to fix vSphereDB stuck in this events check loop by resetting the vCenter events database via the following steps: https://kb.vmware.com/s/article/89245

Hopefully this will work for you as well!

But the underlying reason for what type of bad event causes the vspheredb agent to choke still needs to be investigated. And we still need additional debug/trace logs as highlighted in the OP, at the very least. And, ideally, a way to catch these scenarios and produce an exception would also be added to the agent status when this occurs.

xeiss commented 8 months ago

Your link is broken, but sure I could reset the vCenter events and it would work for some days. But I can reset my Events in VMware because I don't want to lose this history. Also, I think some days later the "bad" event will surely happen again and so the sync is stuck again. So, it isn't a workaround for me.

skupjoe commented 8 months ago

Whoops- fixed. That's understandable that you'd want to keep your events. This issue definitely still needs more investigation and additional trace logging, at the very least, so we'll see what @Thomas-Gelf thinks.

And yeah, there's a good chance that this bug / "bad" event will reappear. I'll post back if it happens.