anthonywebb / homebridge-cbus

CBus plugin for homebridge
MIT License
35 stars 20 forks source link

Homebridge goes in to a boot loop if unable to find cgate server. #123

Closed OrigSorceror closed 2 years ago

OrigSorceror commented 2 years ago

I have notice (particularly after a power interruption) on a RPi that the c-gate server will not run as the cgate network (particularly the SHAC5500) thinks it is still connected to a previous instance of cgate server. It litterally reports that another instance of C-Gate servier is running (even when there isn't one) since no server exists when the pluggin attempt to connect, it continually crashes..

Is there a way that we can put a start delay before running c-gate server or check to see if c-gate server is actually up and running before executing homebridge service to prevent a homebridge restart loop

DarylMc commented 2 years ago

Hi There is a systemd timer in this setup to achieve that. https://github.com/greiginsydney/Homebridge-cbus-installer

DarylMc commented 2 years ago

It is a time delay to wait for cgate to sync the network after reboot

DarylMc commented 2 years ago

On a network with many CBus group addresses it can take a few minutes for CGate to sync so the time value in this example may also need to be increased.

OrigSorceror commented 2 years ago

that will solve half the issue.. but will not solve the c-gate server not starting because it thinks another instance is running. So i need a check to to see if the server is running and if it is not wait 30 seconds and then retry starting the server then wait for server to communicate with network before starting the homebridge server.

My C-Bus network consists of 49 devices. (lights, roller shutters and ceiling fans)

DarylMc commented 2 years ago

Have a look at running Greig's script on a raspberry pi. If you set up Homebridge CBus some other way previously I think you will be quite impressed.

OrigSorceror commented 2 years ago

I am impressed as it does almost everything i have done manually. However i feel it will not work in my use case as my C-bus network is an entirely separate network (physical each with its own NAT router not VLAN) from my computer network and the Pi bridges both networks (2 LAN ports), i have a 3rd router i use to bridge the two networks when i want to use Toolkit, but that is rare so that bridge is left unplugged. The pi is assigned a Static I.P by the Router for the Computer Network side and DHCP for C-Bus side. Only 1 PC on the Computer Network is permitted to access c-gate server to use Toolkit.

I may be overly security conscious in separating the 2 networks physically but i wanted to make it as difficult as possible for anyone to hack in to the system so C-bus has no direct connection to the internet, except via the Pi bridge, which is heavily firewalled, by 2 NAT routers as well as a software firewall.

DarylMc commented 2 years ago

I've always used an ethernet interface to CBus (CNI or device server) just because I thought it was easiest.

Lately I've been using an FTDI USB to RS232 cable to the CBus RS232 PC interface. Running Homebridge and CGate on the raspberry pi keeps the CBus network interface off your local network altogether. I've been really happy with the performance even while running the raspberry pi on wifi. I think this is due to having a wired USB connection from CGate to the PCI.

DarylMc commented 2 years ago

I haven't used any of the Wisers or SHAC. Is CGate running with Homebridge on a raspberry pi? Do you want to add a delay for CGate to wait for the SHAC after a power outage?

OrigSorceror commented 2 years ago

Yes C-Gate is running on same Pi as Homebridge, THe SHAC has battery Backup so it is always on unless the battery drains (Tesla Powerwall, the SHAC is on a backup curcuit but the rest of C-bus is not) THe pi is also not on battery backup although i do have a 2kva UPS i can and will be adding to the comms room to power the modem and switches and the 2 Pis (also have Pi running HASS.io connected just to the computer network for controlling my Actron AC, as it needs access to Actron's web Portal)

Putting the UPS in place would solve the Pi-Cgate issue for as long as the UPS is running (about 40 mins after mains cut-out)

The C-gate init script is located in the init.d folder, I probably should run it from Systemd as init.d is the old method. When pi reboots after power interuption the init.d script is run to initialise c-gate, but because the interruption was brief when c-gate attempts to start it can't as it complains that their is another instance of C-gate server running on the network, as the SHAC has not yet released the connection to the previous server (has not yet timed out the connection) and since c-gate server never starts in this case, homebridge continually goes in to a boot loop when it tries to initialise connection to c-gate as the server is not running.

I note the systemd script for the starting of the c-gate server is a lot less lines of code than the init.d script and it also has restart switched to always.. would that overcome the issue of thinking there is another instance of the c-gate server running?

OrigSorceror commented 2 years ago

as for the homebridge delay, all i need to do is add in the OnBootsec=

OrigSorceror commented 2 years ago

I suspect something else is going on. as the issue does not occur when the reboot command is issued to the pi via CLI. This command would shutdown the c-gate server tidily whereas a power interruption would not.

DarylMc commented 2 years ago

@OrigSorceror I suggest give Greig's setup a go. It has as up to date OS and software as I've been able to get working. init.d services replaced with systemd. Also a timer delay for homebridge service start which I have always found necessary. Greig's script has an automated method of creating the homebridge config file and I would definitely give that a try out. The restart after power failure has been exceptional in my experience.

Don't remove your current bridge in Homekit. Update CBus Toolkit connect to your network and save your project. Copy your latest up to date xml file and put it somewhere safe. Shut down the homebridge/cgate pi you are running now. Install on another SD card following Greig's instructions. Complete the setup but don't add the bridge to homekit. Test power failure and see how it goes. Have a play around with the interface. Swap back to your original SD if you don't want to proceed.

OrigSorceror commented 2 years ago

just built a new bridge with greigs setup, does the same thing :( I even set homebridge delay to 190 seconds. but it still continually loops. when i SSH in to the pi to check the status of cgate with systemctl status cgate I get the following.. pi@raspberrypi:~ $ systemctl status cgate ● cgate.service - cgate Loaded: loaded (/etc/systemd/system/cgate.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Mon 2021-10-04 20:17:10 ACDT; 17min ago Process: 635 ExecStart=/usr/bin/java -Djava.awt.headless=true -jar -noverify /usr/local/bin/cgate/cgate.jar (co Main PID: 635 (code=exited, status=2)

Oct 04 20:17:10 raspberrypi systemd[1]: cgate.service: Service RestartSec=100ms expired, scheduling restart. Oct 04 20:17:10 raspberrypi systemd[1]: cgate.service: Scheduled restart job, restart counter is at 5. Oct 04 20:17:10 raspberrypi systemd[1]: Stopped cgate. Oct 04 20:17:10 raspberrypi systemd[1]: cgate.service: Start request repeated too quickly. Oct 04 20:17:10 raspberrypi systemd[1]: cgate.service: Failed with result 'exit-code'. Oct 04 20:17:10 raspberrypi systemd[1]: Failed to start cgate.

if i then stop the cgate service and do a manual start of cgate it comes back up.

DarylMc commented 2 years ago

@OrigSorceror Did you finish the setup? It will throw an error at boot if cgate and the cbus network is not there.

OrigSorceror commented 2 years ago

Yes it finished the setup.. If i issue a reboot command (sudo reboot) everything works as it should.. it only fails after powerfailure

DarylMc commented 2 years ago

Sounds like you did Well I'm a bit stumped now. In raspi-config there is a setting to wait for network at boot. Give that a try. Just out of interest are you power cycling the SHAC at the same time?

DarylMc commented 2 years ago

@OrigSorceror Thanks I will have a look in the next few days

DarylMc commented 2 years ago

@OrigSorceror Did you let the script create a new config.json or did you use your old one?

OrigSorceror commented 2 years ago

Not powering the SHAC at all just the Pi..

This from the syslog NOTE the second line.. it is thinking another instance is running.

Oct 4 20:17:10 raspberrypi java[635]: Clipsal C-Gate(TM) v2.11.4 (build 3251) Oct 4 20:17:10 raspberrypi java[635]: Cannot start C-Gate. A second instance of C-Gate is already running. Exit$ Oct 4 20:17:10 raspberrypi systemd[1]: cgate.service: Main process exited, code=exited, status=2/INVALIDARGUMENT Oct 4 20:17:10 raspberrypi systemd[1]: cgate.service: Failed with result 'exit-code'. Oct 4 20:17:10 raspberrypi systemd[1]: cgate.service: Service RestartSec=100ms expired, scheduling restart. Oct 4 20:17:10 raspberrypi systemd[1]: cgate.service: Scheduled restart job, restart counter is at 5. Oct 4 20:17:10 raspberrypi systemd[1]: Stopped cgate. Oct 4 20:17:10 raspberrypi systemd[1]: cgate.service: Start request repeated too quickly. Oct 4 20:17:10 raspberrypi systemd[1]: cgate.service: Failed with result 'exit-code'. Oct 4 20:17:10 raspberrypi systemd[1]: Failed to start cgate.

OrigSorceror commented 2 years ago

These errors are even before homebridge is started.. so the config.json is not loaded until homebridge starts.

OrigSorceror commented 2 years ago

Looks like i have solved the issue.. by adding the line RestartSec=20 just before the Restart=always line to the cgate.service script has solved the issue. looking at the syslog it still detects the 2nd instance of cgate running and restarts the service after 20 seconds, after the 2nd attemp (restart count=2) cgate starts up and everything progresses normally..

OrigSorceror commented 2 years ago

Issue resolved by adding a RestartSec=20 line to the cgate.service script before the Restart=always line.

DarylMc commented 2 years ago

@OrigSorceror I re-read this issue and it seems like it might be specific to the SHAC. If it pops up with another CNI interface type I will know where to start looking. Thanks

DarylMc commented 2 years ago

@OrigSorceror I wasn't able to reproduce the fault after testing a power loss using a USB RS232 connection to a RS232PCI

OrigSorceror commented 2 years ago

Quite possibly it is the SHAC causing the issue.. the SHAC has it's own ethernet port that is used as a PCI, although it has a Static IP address. The SHAC may think it is still connected to the c-gate server after a power failure (because the server did not shut down tidily and takes around 40 seconds to realise it is not connected.

So i am running C-Bus v3 with a SHAC5500 connected via the ethernet PCI to my network. I built my house specifically with C-bus in mind as well as networking ports all over the house and 2x 24 port patch panel and a 24 port switch under the staircase with the modem. The SHAC resides in the Garage with above the C-Bus panel.

Programming the touch screen (essentially an Android tablet) which connects to what appears to be a web server on the SHAC... Building the interface for that is a PITA.. it is so damn clunky, wish i could just do it as a series of HTML 5 pages or PHP scripts than using the SHAC's web interface.. But i got a preliminary touch display up that looks somewhat reminiscent of the LCARS computer terminals used in Star Trek ;) Wish i could add sounds :(

peterconn commented 2 years ago

I’m also using a SHAC. Not sure I’m experiencing the same issue however with mine, every time I power down my PI I have to force start homebridge as my timer does not start homebridge.

DarylMc commented 2 years ago

@peterconn Try the change @OrigSorceror posted.

DarylMc commented 2 years ago

@peterconn Also have a look at your logs in Homebridge to see if it is a similar problem.

peterconn commented 2 years ago

@DarylMc will do, cheers

OrigSorceror commented 2 years ago

@DarylMc will do, cheers

if you need a hand happy to remote in and have a gander..

peterconn commented 2 years ago

@OrigSorceror cheers, thanks for that. I’ll give it a go. Need to look up the commands to change it as it’s been going well with no issues so I haven’t had to change anything.