NagiosEnterprises / ncpa

Nagios Cross-Platform Agent
Other
182 stars 94 forks source link

[Windows] NCPA 3.1.0 to NCPA 3.1.1 - ncpa.exe will not run #1210

Open Mojo-OG opened 2 months ago

Mojo-OG commented 2 months ago

Upgrading in-place from 3.1.0 to 3.1.1 on Windows Server 2019 and Windows Server 2022 results in this message being thrown (also prevents the service from starting). image

sawft99 commented 2 months ago

Didn't have issues with Server 2019 Standard on mine. Just ran the normal exe. I seem to recall sometimes having issues with those services not stopping correctly or something along those lines when I did other upgrades. Maybe try stopping them and then running the upgrade and see if that works

markbevill811 commented 2 months ago

We had an issue with the upgrade not running because of a file being locked. The only work around we found was to uninstall NCPA, reboot and reinstall.

sawft99 commented 2 months ago

We had an issue with the upgrade not running because of a file being locked. The only work around we found was to uninstall NCPA, reboot and reinstall.

I think that lock may have been from the services thing i mentioned.

Mojo-OG commented 2 months ago

We had an issue with the upgrade not running because of a file being locked. The only work around we found was to uninstall NCPA, reboot and reinstall.

I think that lock may have been from the services thing i mentioned.

Further to this, the application would not uninstall correctly thereafter either. Running the uninstaller would end up removing uninstall.exe and nothing else. All other files except for .\lib\servicemanager.pyd could be manually deleted, as servicemanager.pyd was reportedly locked by Windows Event Log.

A reboot was not feasible on most machines. Instead I used procexp.exe to find the offending svchost.exe processes locking servicemanager.pyd, killed them, and was able to then both remove the locked file and also cleanly uninstall NCPA on other machines I had not yet attempted uninstall on. This ultimately fixed the issue and allowed the service to run correctly after reinstall, but is frustrating because it seems like the Nagios team did not even test this.

sawft99 commented 2 months ago

We had an issue with the upgrade not running because of a file being locked. The only work around we found was to uninstall NCPA, reboot and reinstall.

I think that lock may have been from the services thing i mentioned.

Further to this, the application would not uninstall correctly thereafter either. Running the uninstaller would end up removing uninstall.exe and nothing else. All other files except for .\lib\servicemanager.pyd could be manually deleted, as servicemanager.pyd was reportedly locked by Windows Event Log.

A reboot was not feasible on most machines. Instead I used procexp.exe to find the offending svchost.exe processes locking servicemanager.pyd, killed them, and was able to then both remove the locked file and also cleanly uninstall NCPA on other machines I had not yet attempted uninstall on. This ultimately fixed the issue and allowed the service to run correctly after reinstall, but is frustrating because it seems like the Nagios team did not even test this.

I'd ask again if you tried stopping the services before uninstalling on problem machines. This also does not seem to be a problem for everyone since I was able to do in place upgrades with no issues so far. That includes any extra steps like the ones I mentioned.

ne-bbahn commented 2 months ago

Has anyone seen anything related to this in their NCPA logs or Windows event logs?

Mojo-OG commented 2 months ago

I'd ask again if you tried stopping the services before uninstalling on problem machines. This also does not seem to be a problem for everyone since I was able to do in place upgrades with no issues so far. That includes any extra steps like the ones I mentioned.

The updates for this application were being handled by PatchMyPC, integrated with SCCM. Conflicting program executables (like ncpa.exe) are configured to be stopped, but only if the installer returns the appropriate response codes that would indicate the installer/upgrader could not be completed due to a running service that conflicts.

BitKeeper82 commented 1 month ago

Having the same issue. Also installing via PatchMyPC (PMPC), integrated with SCCM/Intune. Tested using the PMPC package and with the installers directly. Can confirm it is a file locking issue with Event Viewer.

VM with v3.1.0 installed.

  1. Snapshot VM.
  2. Run the v3.1.1 installer with silent switches.
  3. Installer finishes without error.
  4. Nagios Cross-Platform Agent service does not start.
  5. Revert snapshot
  6. Stop Windows Event Log service
  7. Run the v3.1.1 installer with silent switches.
  8. Start the Windows Event Log service
  9. Nagios Cross-Platform Agent service now starts.

Can reproduce the same issue every time.

ne-bbahn commented 1 month ago

@BitKeeper82, Thank you for figuring out what is going wrong. I can't devote any time to this for the next month, but I will investigate and get this solved as soon as I can.

sistemmsn commented 1 month ago

Your version of NCPA fails me on more than 200 servers, do you know what that means? Do you know the impact this problem has on me, if you tested the new version? I'm still with my team downloading the NCPA versions and getting rid of errors, do you have fixes to solve it? oh who is complaining we have hired nagios XI Bussines

sistemmsn commented 1 month ago

I can't uninstall it because it literally dies, it won't let me reinstall anymore, my only solution is to reinstall over the existing one in its version 3.1.0 and ignore the error image

ne-bbahn commented 1 month ago

@sistemmsn, Have you checked the NCPA logs or the Windows Event logs to try and figure out what is going wrong? What versions of Windows are you encountering this on? I haven't had any issues installing or upgrading NCPA on my machines.

sistemmsn commented 1 month ago

@ne-bbahn windows server 2019 and 2022, I reinstalled version 3.1.1 server 2019 again, but I had to remove the entire NCPA (3.1.0), it seems that the problem is derived from version 3.1.0, because I installed version 3.1.1 on another new one and it worked, for it to work correctly you must uninstall version 3.1.0 and install 3.1.1 and then it works.

Matty-uk commented 1 month ago

I did 50 to 60 Windows 2019/2022 NCPA 3.1.0 to NCPA 3.1.1 upgrades via silent install the Friday after 3.1.1 was released. For me every single one worked without a problem.

sawft99 commented 1 month ago

I actually have this happening on most of mine too now. I don't know why. I had this happen a few versions ago too. I'm trying to think of what may be special about my environment. This is the only thing i can think off:

Anyone who has issues or success could say if they have anything like this in their environment.

fdeyso commented 1 month ago

@sistemmsn, @ne-bbahn Have you checked the NCPA logs or the Windows Event logs to try and figure out what is going wrong? What versions of Windows are you encountering this on? I haven't had any issues installing or upgrading NCPA on my machines.

Hi we have the same issue on most of our production servers, for some reason on test servers that we only use to validate backups and they have no services apart from the really basic ones it updated just fine. The only thing I can see in the logs is that the service didn't start. For now I redeployed 3.1.0 but I'd appreciate some help. image image image

image

EDIT: I also just got this error message image

ne-bbahn commented 1 month ago

Thanks for the extra data. I can't investigate this for a couple more weeks, but this will be my first priority once I can start working on it. It seems to me that some of the issues are caused by the Python version upgrade (3.11 => 3.12). If you remove Python 3.11, that may resolve that particular issue. I will try to figure out what the sources of all these issues are and if they are related or not.

fdeyso commented 1 month ago

Thanks for the extra data. I can't investigate this for a couple more weeks, but this will be my first priority once I can start working on it. It seems to me that some of the issues are caused by the Python version upgrade (3.11 => 3.12). If you remove Python 3.11, that may resolve that particular issue. I will try to figure out what the sources of all these issues are and if they are related or not.

I'm not sure how I can uninstall just the Python packaged in the application. Tried unistalling 3.1.0 and do a clean install of 3.1.1, but didn't help. The uninstaller of 3.1.1 is also broken (I guess it gets hang up on the same libraries), but deploying the 3.1.0 installer over the broken version works and it downgrades just fine (except it deletes all items from the plugins folder and they have to be added back manually)

bramassendorp commented 1 month ago

I can't investigate this for a couple more weeks

Seems like a big issue, that should have high prio now, we have multiple machines affected by this, and simply reinstalling the older version does not solve it.

I would expect some more "enterprise graded" support on this.

jvandermeulen commented 1 month ago

Thanks for adding Priority label for this, we actively reached out to our customers to prevent them from upgrading to 3.1.1. on Windows.

HCR3333 commented 4 weeks ago

Yeah we have experienced this as well...as noted scripted it to run uninstall and stop Windows event log and then install. Seems to go ok unless you have vmware then somehow it interrupts communication to the vmware servers. Still looking into that one.