Closed ThomasLohnert closed 5 years ago
I think someone needs to inspect the code in SECI to see what it does and why it might be faster. Also, we need to compare the steps in SECI and IBEX, to see if any steps differ in time.
What if you don't do any data collection? (i.e. move to the next point in the scan, stop (just for a moment), and then move to the next point). How does that compare between SECI & IBEX?
The long waits could be due to the duration of moves. See whether https://github.com/ISISComputingGroup/IBEX/issues/4541 impacts the performance
Summary of my findings so far. The results presented here are collected using the python cProfile
plugin with the command:
cProfile('scan("HEIGHT", -0.8, 0.8, count=3, frames=100'))
using the scan
implementation from SURF. I have measured the individual components in cases where it was not obvious by factoring out the relevant bits of code into their own methods and looking at the cumulative time (including sub-method calls) of those methods. (the overhead this adds is negligible)
Component | Duration (s) | Duration (%) |
---|---|---|
Total | 64.6 | 100 |
detector | 44.7 | 69.3 |
iter | 7.6 | 11.8 |
enter | 9.9 | 15.3 |
misc | 2.3 | 3.6 |
detector()
is essentially the per-instrument implementation of the individual scan steps which processes and returns the data to plot__iter__()
is the code from the scans library that coordinates motion (i.e. calculating and moving to the points at which to take data)__enter__()
is the context manager bit, which is how the detector
gets instantiated. i.e:
with open(self.defaults.log_file(), "w") as logfile, \
detector(self, save=save, **kwargs) as detect:
...
Don't think there is much we can do about the context manager bit, but it is only called once per scan
i.e. the duration does not scale up with more iterations so it should not be a huge factor - in this example I used 3 iterations but in practice the standard is more like 11-21.
Here is the breakdown for the other two major components:
Component | Duration (s) | Duration (%) |
---|---|---|
Total | 7.8 | 100 |
waitfor_move | 3.3 | 42.4 |
action | 2.3 | 29.4 |
create_dictionary | 2.2 | 28.17 |
Component | Duration (s) | Duration (%) |
---|---|---|
Total | 44.7 | 100 |
waitfor_frames | 5.7 | 12.8 |
get_spectrum | 16.4 | 36.6 |
waitfor_move | 6.5 | 14.6 |
waitfor_time | 6.0 | 13.4 |
resume | 2.9 | 6.6 |
pause | 2.6 | 5.9 |
misc | 3.2 | 7.3 |
(summed for 3 iterations)
Here is where I can see improvements being made:
g.get_spectrum()
currently takes 500ms per call as it contains 4 get_pv
calls (~120 ms each). I'm not sure how, but it would be nice if we could reduce this. Note that # bins of the spectra makes no difference (I profiled it on different instruments with a range of 100-6000 bins, but it was always consistent arounnd 500ms)detector()
implementation contains a retry loop with 5 iterations for when 0-spectrums are returned (i.e. one of monitor and detector spectra does not contain any data). I believe this was implemented because get_spectrum()
sometimes would not return the correct data on the first call. We should check whether this is still the case. If so, find out why and if possible, fix it. This is where by far the largest gain can be made. I think we may need to test this on a real instrument with neutrons. This may be mostly a problem with a simulated DAE, as it would always return a zero-spectrum.waitfor_move
in detector()
seems to be completely redundant, as this is already done in the __iter__
code. A call to waitfor_move
with idle motors still takes 2 seconds so removing this should be an easy gainwaitfor_time(1)
statements which I suspect were put in as safeguards. We should check whether these are actually necessary.One bit I have not yet looked at in detail is the duration of motion in between scans. If this turns out to be a major factor in practice, the only thing I can think of to improve it would be tweaking the backlash distances /speed. This would have to be handled at the motor settings level and is a decision the scientists would have to make per axis (i.e. it's a tradeoff between speed and precision/reliability)
I have implemented changes 2-4 from above, i.e. I have thrown out the redundant waitfor_move
, the arbitrary waitfor_time
and the zero-spectrum loop as I could not recreate the bug where get_spectrum
does not return data. I think we should optimistically run like this next time and just keep an eye on stability.
I have also looked at g.get_spectrum
, which contains 4 caget
instructions for a given spectrum/period: 2 for x and y data, and 2 for x and y length usd for trimming it (since the data in x and y data PVs is fixed length). The only way I see to make this faster is to get only x or y length and use it to trim both x and y data, which assumes they are always the same length. This is the reality for the instruments I have checked, but not sure whether this is actually a general rule.
Here is a comparison of the performance for scan("HEIGHT", -0.8, 0.8, count=11, frames=100)
(a typical scan command)
SECI
is duration in SECIIBEX curr
is IBEX as we used it for testing on SURF (before these changes)IBEX trim
is with the changes aboveIBEX trim no backlash
is as above but with backlash distance of the scanned axis set to zero (instead of 0.05). MC said that longer scans due to slower backlash moves would be acceptable but I wanted to check how much of a factor this is.While these changes have greatly improved scans performance, it is still lagging far behind SECI at almost twice the duration even without backlashing.
For the best IBEX case using the above command, the duration breaks down as follows:
Action | Duration total | Duration % | Comments |
---|---|---|---|
total | 90.7 | 100 | |
waitfor_frames | 18.6 | 20.5 | This can not be reduced further |
pause | 15.5 | 17.0 | Time/ call IBEX: ~1.4s / SECI: ~0.7s |
resume | 10.3 | 11.3 | Time/ call IBEX: ~0.9s / SECI: ~0.15s |
get_spectrum | 10.0 | 11.1 | |
waitfor_move | 9.0 | 10.0 | This can not be reduced further |
begin | 7.3 | 8.1 | Time/call in SECI : ~2.8s |
Scan.action | 4.4 | 4.8 | Checks whether a given motion step is possible |
Scan.create_dict | 3.8 | 4.2 | Updates list of outstanding motion steps of this scan |
Scan.plot | 2.3 | 2.5 | |
change_nperiods | 1.3 | 1.4 | |
get_period | 1.1 | 1.3 | This is done parallel to counting and should not actually impact performance |
get_frames | 1.1 | 1.3 | |
misc | ~6 | ~6.5 |
(SECI timing information taken from genie log for scans we did on SURF last cycle)
It's worth investigating whether we can make begin
, pause
, resume
any quicker in genie_python since there is quite a difference between SECI and IBEX. Other than that, there are possibly some minor gains to be made in the Scans code, but this is hitting a point of diminishing returns.
It seems to me the thing that generally makes everything slower compared to SECI is the reliance on Channel Access requests. We do have a couple of other options to speed scans up, namely
caget
s each loopThese things may require larger changes to the scans library. I will discuss this with the main Scans developer if we cannot improve the performance to a level that is acceptable for the reflectometry scientists by other means.
I should also mention that scans in IBEX seem to stall sometimes seemingly at random for lengths in the region of 10s at a time. I have not yet been able to figure out the cause of this.
Thanks @ThomasLohnert . This is a good investigation and you have identified a number of things we can do to get better performance. I agree, we should look at begin
, pause
and resume
to see if we can make more gains. Is there any more we can do to get_spectrum
?
How do waitfor_frames
and waitfor_move
compare to their SECI equivalents?
If dist=True then x and y are same length, if dist=False then x will be one point bigger (as it is histogram bin boundaries). However dist seems to be ignored and I proposed a fix in
https://github.com/ISISComputingGroup/IBEX/issues/4014 So it is unnecessary to get both X and Y array sizes if you know what dist should be (and bearing in mind the broken behaviour mentioned in 4014). Channel access does support variable array sizes, but i think not all clients are happy with this so we would need to confirm it was Ok everywhere.
The ibex begin does more than a seci begin, it resyncs run control and also restarts (and checks has restarted) the archive engine. These operations could be moved into separate DB monitoring logic and so run independently after a begin has started
Given @FreddieAkeroyd 's previous point, I think we should do two things:
begin
does more than a SECI begin
It looks like ibex is 90s when seci is about 55s. Lost time of 45s seem to be
7 seconds pause 7 seconds resume 4 seconds begin
how many lost for get_spectrum? at most 9. Can you time the "s1=GET(1)" command in Open GENIE. That still leaves 20s to account for.
I'm not sure waitfor_frames
and waitfor_move
can be made faster as they are waiting for external events (neutrons and motion respectively). I tried reducing the sleep time in the waitfor_loop
(i.e. how often does it check whether the waitfor condition has been met) from 0.3s to 0.01s, but it did not seem to make a noticeable difference.
s1=GET(1)
takes around 50 ms (vs 500 ms for get_spectrum
).
I added a comment for reducing get_pv
calls on https://github.com/ISISComputingGroup/IBEX/issues/4014
Ticket for speeding up DAE created here: https://github.com/ISISComputingGroup/IBEX/issues/4666
I will bring up the topic of scan performance with the scientists. The changes implemented & proposed above might get us close enough for it to be accepted. If not, I will talk to AW about ways of speeding up the scans code. After that I think it would get very fiddly to try and wring more speed out.
The size of the X array or Y array will not change for a given spectrum during a run; also the xarray will not change for a given spectrum during a run. So in principle it would be possible to cache the xarray (and its size) and only re-read the y array during a scan
Does ISISComputingGroup/genie_python#169 help?
Thought of something else that is making pause / resume slower. At the end of any DAE command I make sure the PVs for information (frames, microamps etc.) are correct for that state, this is in case anybody read these pvs immediately afterwards. As i don't know what what people might read, I have to update them all. SECI didn't need to do this, genie made a direct call to the isisicp if it needed a value so all a pause had to do was pause data collection. The problem is isisdae ioc and isisicp being separate processes, they need to be kept in sync. Possible solutions:
https://github.com/ISISComputingGroup/genie_python/pull/169 makes quite a difference. It looks like IBEX might actually beat SECI for longer scans?
I will profile SECI against different IBEX conditions (pre changes, post changes, no backlash) for several scans with different parameters.
Excellent news! I'll be very interested to see the new profiles. And we should make sure the scientists know of these performance gains.
Sorry, I spoke too soon. The scans in SECI I compared against from earlier did not run properly which I believe was a result of starting OpenGENIE before SECI. It is still slightly faster but we are getting closer:
Condition | SECI | IBEX pre | IBEX post | IBEX post no blash |
---|---|---|---|---|
11 steps, 100 frames | 1151.534 | 154.061 | 70.633 | 62.124 |
11 steps, 500 frames | 148.191 | 233.086 | 163.028 | 159.06 |
41 steps, 100 frames | 173.375 | 527.913 | 230.238 | 233.903 |
41 steps, 500 frames | 533.394 | 820.895 | 574.434 | 560.031 |
Average all scans | 226.6235 | 433.98875 | 259.58325 | 253.7795 |
Note that IBEX PRE is run under the optimistic assumption that get_spectrum never returns zero spectra for either detector or monitor, otherwise each failed attempt adds 1 second. The reality of this is not testable without neutrons.
The main difference between the modified IBEX and SECI scans seems to be in the duration of pause
/ resume
which accounts for about an extra 0.5 - 0.6s per iteration.
If we can reduce this as per Freddie's suggestions above I imagine this will bring the performance difference to an acceptable level (if it is not already acceptable as is).
The other thing I still need to do is to find a proper solution for caching the spectra size across scan steps (I have been testing it with a hacky bit of code). This will likely require changes in the scans library so I will talk to AW about this when he is back. However, this now has lower impact on the time with the above changes making get_pv faster.
Excellent news - the time for short scans has almost halved! I think James Lord would be interested in the python fix too as he does a lot of PV setting during pre-cycle tuning.
We could investigate dynamic array sizes at some point, in theory you should never need to read the array size separately but we had issues (gateway maybe?) in the past when a size changed. I think we may understand this now and be able to fix.
Some other thoughts:
Do you use cget / cset a lot in the code?
Each gets called 1x per scan step. Looking back at the results it seems the average time per call for both varies quite drastically. I think cget/cset may be the source of the random stalls I described in an earlier comment - this seems important to fix (may be fixed by https://github.com/ISISComputingGroup/IBEX/issues/4338). I also tried modifying get_blocks
to return a hardcoded list which gave a slight performance gain compared to the "best case" runs.
The waitfor command (so waiting for frames) has a 0.5 second delay in the loop
wait_for_runstate has a 0.3 second delay in its loop
Changing both of these to 0.1 actually makes quite a difference. (~20 seconds for 41 steps, 100 frames). I'm not sure whether this is safe to do as a general change to waitfor
or whether it should be an option. Could running sleep in a tighter loop have negative performance implications in other areas?
Do you or the scans library call wait_for_pv()
No.
Summary of investigation:
In our Defaults
implementation (Surf.py
):
waitfor_time
waitfor_move
get_spectrum
partsleep
duration in waitfor
methods (https://github.com/ISISComputingGroup/genie_python/pull/170)get_spectrum
All the code I modified temporarily for theses tests is on surf, in Instrument\Scripts\
and Apps\Python
. All the files in question end in _4547.py
Does the runstate change in #4686 make much difference to pause and resume times?
Used https://github.com/ISISComputingGroup/genie_python/pull/171/files instead of 0.3 -> 0.1 wait in ISISComputingGroup/genie_python#170 (Advised by @FreddieAkeroyd this seems ok and tested by disabling state check in genie_python, error is caught and alerted)
The above 3 PRs look good, with them applied a rough calculation suggests that "IBEX post no blash" and SECI times should now be almost the same?
As a reflectometry scientist, I would like scanning to run at least as fast as in SECI. Currently IBEX seems to take much longer due to the pauses between individual data points (where it stops data collection, moves the motor, changes period) being much longer (5-10 seconds instead of 1-2).
I'm not sure what the reason is (DAE behaving differently in IBEX? specifics of the implementation of the scans library?). In any case, fixing this is a priority according to reflectometry scientists, as this makes the already quite tedious scanning process much slower.