Cacti / cacti

Cacti ™
http://www.cacti.net
GNU General Public License v2.0
1.65k stars 406 forks source link

Automation Network Scan Stuck "In Progress" #3220

Closed danfiscus closed 4 years ago

danfiscus commented 4 years ago

Describe the bug One of my automated network entries has been stuck as "Running" for a few days straight. I've tried selecting it and using the Cancel Discovery action, but it does not seem to affect the status. I cannot find any references to the scan still actively doing anything in the log, nor does anything appear in the log when I attempt to Cancel Discovery (my log level is set to High--perhaps that is not a message that would appear at that verbosity, but I'm afraid to set it any higher lest my logs take up even more space)

To Reproduce I'm not sure how it got stuck, but if you can reproduce the issue, it seems cancelling discovery does not help.

Expected behavior I expect the scan to return to Idle status when there are 0 devices left to scan or process, or to be able to cancel an "active" scan from the Cancel Discovery action.

Screenshots Frozen Scan

Desktop Client:

Server Host:

Additional context The cacti server is on a virtual machine, which after two days of this issue, was shut down over the weekend and restarted a few hours ago. The problem still persists after a clean system reboot.

bmfmancini commented 4 years ago

Hey! I was having this issue there was a couple of bug fixes but since your on 1.2.8 you should have them

Do you have CeriusReporting installed?

do a ps -ef | grep PHP to see the PHP processes running if there are try to kill the pid

Check the autom8 process table SELECT * FROM automation_processes;

if you see a bunch of them you can truncate the table to remove the stuck jobs

If you have the Ceriusreporting plugin installed that's what I found to be causing my issues I sent this to the dev to fix i

danfiscus commented 4 years ago

@bmfmancini I just checked the plugins list and the only plugin installed is Hmib, which to my knowledge, is not being used to anything at the moment. No mention of CeriusReporting anywhere that I can find

bmfmancini commented 4 years ago

Ok cool so maybe just a hung php proccess you can kill the proccess either in the DB or if its a php proccess kill it

danfiscus commented 4 years ago

How would I determine which of the two is running the process?

bmfmancini commented 4 years ago

it would be a autom8 proccess which would have the network id in the proccess name

danfiscus commented 4 years ago

Is there a command to list active autom8 processes?

bmfmancini commented 4 years ago

Yes via the db SELECT * FROM automation_processes;

danfiscus commented 4 years ago

Noob question but how does one access the database via the CLI? I tried mysql cacti.sql but the command didn't do anything. Again, apologies, I'm very new to this

bmfmancini commented 4 years ago

mysql -u root -p use cacti or db name SELECT * FROM automation_processes;

danfiscus commented 4 years ago

Okay, I did that, and if I'm reading this right, there are multiple running scans all for that network? SQL Process List

netniV commented 4 years ago

I think you have 10 set against your automation, though three are running a single host each.

danfiscus commented 4 years ago

Strange, I'm not sure how that happened. Speaking of automation, I also can't find entries in any of my cron files for Cacti, is this something I should be concerned about?

Edit: not trying to derail the thread here, sorry, I am still sitting with the DB window open if anyone has any advice on what (if anything) to kill

bmfmancini commented 4 years ago

you can truncate the table to kill the remaining running proccessess truncate automation_processes;

danfiscus commented 4 years ago

Perfect, thank you all very much for your help! I now have no hanging scans, they are all idle. If there's anything else I can provide to help this issue be prevented in the future, let me know. If nothing else, a GUI override that asks "Are you sure?" and does the database truncation to avoid CLI tools would be a nice feature request, though certainly not a pressing one now that I know how to fix this manually.

netniV commented 4 years ago

There was another issue that related to stuck automation processes, I thought that had been resolved so the question is whether you are beyond that patch or not.

netniV commented 4 years ago

When running automation, scan can fail when selecting remote pollers was the one I was thinking of.

danfiscus commented 4 years ago

I read through that issue and it does seem similar in nature, as for what patch version I'm on, I don't know what version those fixes were implemented in. Would release 1.2.8 contain the patch?

netniV commented 4 years ago

I believe it should have done, so it would seem its not fully resolved. Out of curiosity, are you using the CereusReporting plugin?

danfiscus commented 4 years ago

No, I am not, though it was mentioned earlier in this thread, so I double checked. The only plugin I have is HMIB, would that be causing the problem? I doubt it only because I don't think I've actually set that plugin up yet

cigamit commented 4 years ago

It may. Though we have corrected quite a few of automation issues recently.

bmfmancini commented 4 years ago

I had did happen again to me however just on 1 network scanning around 15 subnets weird thing that always gets me is that it passes the maximum scan time truncate of the colum fixed it

TheWitness commented 4 years ago

@danfiscus, this issue should be resolved completely in the 1.2.11 release. Please upgrade to that release once available, and open a new ticket if the issue persists.