Implement suspend/resume mechanism

MichaelWedel commented 7 years ago

As outlined in #144 and #145, the way disconnecting/re-connecting the device worked as of 1.0 it was more or less a suspend/resume mode.

In PR #170, this is fixed, but now there is no suspend/resume mode anymore. We should think about introducing that mode explicitly, but it's probably not that urgent since we did not have this mode in mind when we did the original implementation.

MikeHart85 commented 7 years ago

I don't think this should be backlogged, since (while implemented accidentally) suspend/resume is a desirable feature that we want to have. It doesn't seem right to have a release that removes existing and useful features.

MichaelWedel commented 7 years ago

If we move this to Release 1.1, we need to have a somewhat better idea what suspend really means. We can't "implement" it in the way we did before, because that doesn't resolve #144.

Is this protocol specific? How do we model "hanging" devices?

MikeHart85 commented 7 years ago

We also already have pause and resume commands for hanging the simulation of the device itself. But that leaves the interface alive and responding normally. I wonder if this should be part of that, or separate functionality specific to the connection.

The implementation will definitely be protocol specific in any case. That shouldn't be a problem though, it's the same for connecting and disconnecting. From a Stream TCP perspective, I think the correct behaviour would be:

Current connections stay open
New connections are rejected or time out
Commands from current connections are flushed/discarded on resume

MichaelWedel commented 7 years ago

Good question regarding pause/resume...I haven't made up my mind about this yet. On the one hand it seems logical that a paused device would not respond to commands, on the other hand it could still be useful behavior (e.g. to pause the device before or after some failure and see what "it says").

Probably it's not only depending on the protocol, but also on the device. I think EpicsAdapter is a good example for what I'm trying to say here - suspending communication would be roughly equivalent to some problem in the IOC, while the underlying device would of course continue to function. In some cases of StreamAdapter that might be similar, if the device has some sort of higher level computer thing that manages a network connection (or is connected via a serial to ethernet adapter that can have its own problems).

Having written all that I begin to think that these are different things and should be separate: One aspect is the device simulation (for which we have pause and resume) and one is the communication, which would be suspend/resume.

Can we find another terminology for suspend/resume to make it more clear what we mean?

As a side note, I think for the EPICS case it is enough to stop calling handle.

MichaelWedel commented 7 years ago

As those will be methods of Simulation, we should maybe stick to the naming we've used so far. We have pause/resume for the simulation itself, while we have connect_device and disconnect_device for interacting with the device. Maybe it could be suspend_communication/resume_communication?

MikeHart85 commented 7 years ago

I don't know... it all seems very verbose and not very consistent. connect_device and disconnect_device really only affect the communication (adapter) layer as well. I can't recall... was there a reason we didn't call those connect and disconnect?

MichaelWedel commented 7 years ago

I'm not 100% sure but I think that was to make it clear that it's affecting the device, not the control client or any other thing that is running.

MikeHart85 commented 7 years ago

Another thing to think about (as if it's not enough) is that we keep floating the idea of supporting multiple interfaces simultaneously. Perhaps each interface (eventually; for now just the single one) should be a top level object in the control interface? So alongside device and simulation?

MichaelWedel commented 7 years ago

That would certainly resolve the ambiguity in the different suspend/resume things. What about disconnect/connect though? Given that this is supposed to be "physically cutting the cable" it might affect all interfaces. But then one device could be connected to multiple "boxes" that do communication via separate wires...for the sake of simplicity and ease of use we should probably allow both - one "big red button" that disconnects/suspends all interfaces (I think this is the most common use case although I can't provide evidence), but also allow doing that for each interface separately.

This means that we'd still have to have a method on Simulation to do this.

MikeHart85 commented 7 years ago

For completeness and convenience, each interface could have connect / disconnect and simulation could have connect_all / disconnect_all.

MichaelWedel commented 7 years ago

Yes, that's exactly what I meant - we have it as connect_device and disconnect_device now. What do we call the convenience suspend_all/resume_all?

MikeHart85 commented 7 years ago

Heh, back to the same problem.

Well, if we imagine we have support for multiple interfaces, how did we solve the naming problem in the control interface? device, simulation and...? Something based on the adapter type? What if there's two TCP interfaces? Based on the protocol attribute? Then you might have problems with devices that have different versions of interface (like julabo-version-1 OR julabo-version-2... would be inconvenient for scripting if you have to know which interface version is being used).

How about always:

device
    ...
simulation
    pause
    resume
    ...
interface
    connect
    disconnect
    suspend
    resume
    ...

Where all interface methods take an optional parameter that corresponds to the protocol attribute of Interfaces. If the optional parameter isn't given, all interfaces are affected. So you can do:

$ lewis-control interface disconnect epics
$ lewis-control interface disconnect stream
$ lewis-control interface connect

MichaelWedel commented 7 years ago

The beauty of circular discussions :)

I think your suggestion looks good. Instead of storing the adapter in Simulation directly, there would be some sort of InterfaceManager (I think I've already fallen in love with that name - it conveys nothing and everything) in Simulation that we expose as interface to the outside world and has the methods you suggest, plus documentation and some way to query what protocols are available. Introducing this additional layer will indeed make it easier to allow for multiple interfaces (in fact this might already be the largest part of the solution).

And to close the circle, we have to make a decision what we do with the existing "legacy" methods connect_device, disconnect_device, device_connected and device_documentation. Have we been around long enough to make it necessary to phase them out (deprecation warning, remove in release after next)?

MikeHart85 commented 7 years ago

I think we're early enough in development that it would be cleaner to remove those methods completely to avoid ambiguity and extra work. Might also want to rename simulation pause to simulation suspend for all-round consistency.

MichaelWedel commented 7 years ago

We'd have to be very clear in the release notes about this change and count on people actually reading them. Otherwise anyone trying to use those methods will just see an AttributeError with no further indication how to proceed. On the other hand we do show the interface of the exposed device, so maybe it's not so bad. Maybe we could consider displaying that documentation as well in some form.

ISISComputingGroup / lewis-ess

Implement suspend/resume mechanism #171