Closed slominskir closed 5 years ago
Two additional points:
public void contextVirtualCircuitException(ContextVirtualCircuitExceptionEvent ev) {
LOGGER.log(Level.SEVERE, "EPICS CA Context Virtual Circuit Exception: Status: {0}, Address: {1}, Fatal: {2}", new Object[]{ev.getStatus(), ev.getVirtualCircuit(), ev.getStatus().isFatal()});
Transport[] transports = context.getTransportRegistry().toArray();
for (Transport t : transports) {
// No port in getVirtualCircuit(). Hope there aren't multiple CA servers at that IP!
if (ev.getVirtualCircuit().equals(t.getRemoteAddress().getAddress())) {
CATransport cat = (CATransport) t;
Channel[] channels = context.getChannels();
for (Channel c : channels) {
CAJChannel cac = (CAJChannel) c;
if (cat.equals(cac.getTransport())) {
//TODO: Now that we know the channel, we can lookup the WebSocket(s)!
}
}
}
}
}
Opting to "do nothing". It seems the gov.aps.jca.event.ConnectionListener will notify ChannelMonitors when connectivity issues occur. Reset the whole context code was as removed.
We need to research how best to handle the scenario where an IOC becomes unresponsive and the client API (CAJ) throws ContextVirtualCircuitException with status 60:
EPICS CA Context Virtual Circuit Exception: Status: gov.aps.jca.CAStatus[UNRESPTMO=60,WARNING=0]=Virtual circuit connection unresponsive
Currently epics2web will attempt to reset the entire context in this scenario: context.destroy() is called and then a new CAJContext is created and all monitors are re-created.
However, the EPICS CA protocol specification (https://epics.anl.gov/docs/CAproto.html#secVCUnresponsive) says disconnect should be avoided in this scenario. Empirically I've observed CAJ does not recover automatically from status 60, but more research is needed as this could just be due to bugs in epics2web, especially if multiple IOCs are rebooted simultaneously. Recreating this scenario has proven tricky as killing a running Java CAS server results in status 24, something CAJ client can recover from automatically once the server is restarted. I'm seeing status 60 when an RTEMS IOC is restarted in production (maybe it crashes).
Note: an unresponsive IOC is different than a disconnected IOC. A disconnected IOC results in the Status 24:
EPICS CA Context Virtual Circuit Exception: Status: gov.aps.jca.CAStatus[DISCONN=24,WARNING=0]=Virtual circuit disconnect
In the case of CA Status 24, epics2web will defer to the underlying CAJ library to watch for the IOC to come back online and automatically retry to connect. In other words epics2web does nothing and ignores a ContextVirtualCircuitException with status 24.