NSLS-II / nslsii

NSLS-II related devices
BSD 3-Clause "New" or "Revised" License
11 stars 23 forks source link

add a set of generic EPS two state devices #52

Closed awalter-bnl closed 6 months ago

awalter-bnl commented 5 years ago

This PR is designed to add some generic EPS two state ophyd.Devices (photon shutters, gate valves and pneumatic actuators). As these are common NSLS-II items we would prefer to have a common source for these, instead of the scattered large number we currently have.

These are based on what already exists on the floor, but are un-tested. I am opening the PR to get feedback on the overall structure of this. Tests will be added once the DAMA group decides how we should implement them (in particular the IOC's to test against).

prjemian commented 5 years ago

Just gone through a revision of our shutter infrastructure with the goal of making it uniform regardless of the hardware implementation. For any shutter, shutter.open(), shutter.close() are viable commands. The code relies on (hardware-dependent) subclass implementation of a state method that tells if open, close, or unknown. Also, any shutter can be used in a plan such as: yield from bps.mv(shutter, "open"). There are properties for shutter.isOpen and shutter.isClosed and some other useful stuff.

You might take a look at the source code:

The shutter simulator is a great tool. We use it like this in our ipython startups so we can test our control software during maintenance periods:

if aps.inOperations:
   safety_shutter = ApsPssShutterWithStatus("pv_prefix", name="safety_shutter")
else:
   print("!!!!! Note: using simulated safety_shutter !!!!")
   safety_shutter = SimulatedApsPssShutterWithStatus(name="safety_shutter")
prjemian commented 5 years ago

@awalter-bnl : Your analysis is correct. Operation of a valve looks like a shutter. That could be subclassed from ShutterBase or possibly used directly from EpicsOnOffShutter

awalter-bnl commented 5 years ago

Thanks for the feedback @prjemian I will take a look at your shutter implementation, and see what is there. This is actually heavily based on exisiting code we have which is seperated between beamlline profiles at NSLS-II at the moment and we are attempting to centralize it.

Unfortunately I am not yet convinced that sharing the code for facility dependent shutters is a good idea because:

  1. our shutters are quite anachronistic in that they sometimes need to be 'poked' multiple times to actually change state so we have some built-in mechanisms to take care of that.
  2. our shutters/ gate-valves and pneumatic actuators all share a similar IOC (based on our EPS systems plc controls) and so I think something facility specific which we can tune to this use case is better for us.

What do others think?

tacaswell commented 5 years ago

One thing that I think should be given a bit of thought here is the error handling around trying to start a while another set is in motion. Currently, the status object stashed on the shutter object is cleared on the success of the motion, but if you tell it to open and, for what ever reason, in never actually opens the status object will never be cleared and we won't be able to use the shutter again without restarting bluesky (or reaching in and clearing it by hand). This should probably be cleaned up a bit, but I am not sure the best way. If I recall correctly, the original version of this class was written while a beamline scientist was watching me so I took the "fail hard, sort it out later" approach.

I think our options here are:

[1] https://github.com/NSLS-II/ophyd/blob/204c8b8bfb3d907670c2a083b7f5c2dbe885c7f5/ophyd/positioner.py#L185

awalter-bnl commented 5 years ago

add logic else where to make sure we clear this, but leave the hard fail

I am not sure where this logic will live, or what it will look like but it seems like the most likely of the @tacaswell 's 3 options to work.

unconditionally fail the previous move like Positioner does [1]. I am not sure that this will work with all of the devices that this will control as the IOC can not be interrupted cleanly / will not take new direction while it is actively moving. I am also not sure that it will provide clear feedback that it ignored you

I suspect @tacaswell is correct here, in that although they share an IOC template each of these cases has very different hardware.

as we only have 2 states, detect if we are going to the same place as we were just told to and simply return the old status objects, if going the other way do one of the first two.

This may also work as a solution but the second part causes problems. For instacne there is no way for any of these item to be 'stopped' mid-way between opening and closing (as they are all pneumatic actuators). So if we are not going in the same direction as we previously asked we must wait until that move is done before requesting the next move.

tacaswell commented 5 years ago

There are three places we need to keep a reference to the DeviceStatus object on the this device

  1. on the instance its self to keep track off if there is current a set in progress
  2. the returned reference so the ultimate caller can wait for the device to finish
  3. in the callback that gets handed to the status object so it can notify the rest of the world when it is done.

The epics callback runs on a thread (actually a c-thread that libca owns) so we need to worry about concurrency between the various parts.

If we have the simplest verison of this code it looks something like

def set():
    if self._st is not None:
        raise

    self._st = Status()

    def cb(value, **kwargs):
       if value:
           self._st._finished()
           self._st = None
           self.pv.remove_cb(cb)

   self.pv.subscribe(cb)
   return self._st

which will work most of the time, but there are a few gotcha's in the timing. If the callback fires before the return statement is executed, self._st will be None which will break the RE (as it expects a Status object to be returned). Another way this can fail is say you have the plan

yield from bps.mov(shutter,' 'Open')
yield from bps.mov(shutter, 'Close')

you can get the timing where

  1. the open callback sets the status objects as done
  2. the wait in the RE plan returns, the next message is processed and we try to set the shutter colsed
  3. that set blows up because the callback has not set self._st to None

I third race condition is that if the callback fires a second time before the first one finishes (which I don't think should be possible, but that depends on the details of how both libca and ophyd deal with dispatching the callbacks bubbling out of the monitors) where you could hit the self._st = None from the first callback before you hit the self._st._finished() from the second.

Both of these timings will be relatively rare.

The following is a bit more defensive

def set():
    if self._st is not None:
        raise

    st = self._st = Status()

    def cb(value, **kwargs):
       if value:
           self._st = None
           st._finished()
           self.pv.remove_cb(cb)

   self.pv.subscribe(cb)
   return st

by keeping it's own reference to the Status object (both to return and to have the callback close over) makes it robust to all three of the race conditions above.

tacaswell commented 5 years ago

We also need to make sure that calling stop on the device correctly cleans it's self up (failing the status object, pulling the callback off, clearing things so the device can be moved again).

It is possible that we should be using threading locks here (rather than relying on careful ordering) and that the in-line callback is not the correct choice to make sure we can reliably remove the callback (by subscribing a method) or we should be tracking the id returned by subscribe rather than using the callable.

awalter-bnl commented 5 years ago

Thanks for the detailed response @tacaswell , and for picking up on this initially @mrakitin. I have made the changes as requested by @mrakitin and added the custom stop method as requested by @tacaswell .

awalter-bnl commented 5 years ago

thanks for the review @danielballan

awalter-bnl commented 5 years ago

latest commit adds tests againsts the new caproto sim IOC (thanks @malitsky for the work on that !-) ).

awalter-bnl commented 5 years ago

@mrakitin can you take another pass over this? I think it is ready

awalter-bnl commented 5 years ago

I am guessing this is dead due to the lack of responses, if others feel differently feel free to reopen.

awalter-bnl commented 5 years ago

re-opening at the request of @mrakitin

codecov-io commented 5 years ago

Codecov Report

Merging #52 into master will decrease coverage by 2.36%. The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #52      +/-   ##
=========================================
- Coverage   52.16%   49.8%   -2.37%     
=========================================
  Files          11      13       +2     
  Lines         692     759      +67     
=========================================
+ Hits          361     378      +17     
- Misses        331     381      +50
Impacted Files Coverage Δ
nslsii/__init__.py 4.76% <0%> (-1.23%) :arrow_down:
nslsii/tests/test_logutils.py 100% <0%> (ø)
nslsii/common/ipynb/logutils.py 100% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update bd076a7...4469d28. Read the comment docs.