Disable sequencer before syncing

aderbenev commented 8 years ago

There is an issue when performing a Commit operation on an Enabled, Loaded sequencer. Commit sometimes causes the sequencer to run without external trigger. At NSLS-II, this triggers ACMI fault to appear spontaneously when changes are made which require sequencer commit, e.g. changing the target injection bucket. ACMI fault means that certain event does not arrive at expected times. Troubleshooting has shown that this event indeed arrived not when expected.

Testing:

With EVG using FracSyn source, Mxc divider was set to obtain ~0.1 Hz frequency (the glitch occurs at 1 Hz as well, but at much lower rate). Sequencer table containing event 25 with timestamp 14 (and other events) was used and assigned to trigger from Mxc in Single Run mode. To rearm the sequencer, single shot controller logic was used which re-enabled the sequencer once its status became Disabled.

Two python scripts were used to provoke and track the glitch: first printed out the time difference between subsequent arrivals of event 25. Second one invalidated the sequencer configuration by switching and restoring the trigger source parameter and performed a commit on arrival of event 25.

Expected behavior is when event 25 arrives every 10 seconds. Observed behavior (output of the first script), tracking event 25 arrival:

9.277065869 <- Mxc ticks and event 25 arrives, the second script does a commit 0.912279932 <- script sees event 25 again even though Mxc is ~0.1 Hz 0.912262812 <- another spontaneous run, second script performs a commit every time 0.912260970 <- commits keep causing the glitch 0.912296550 0.912276747 <- last spontaneous run, next commit doesn't cause the glitch for some reason 5.627969022 <- Mxc ticks, legitimate event 25 arrival 0.912301377 <- spontaneous runs follow 0.912269223 0.912266949 0.912302002 0.912268566 5.628053982 <- Mxc ticks normally 0.912294509 <- for some reason glitch appears only once 9.277051525 <- Mxc ticks, script commits 0.912281013 <- glitch again ...

It is peculiar that the time it takes for the spontaneous trigger to appear, ~0.91 seconds, corresponds to the last timestamp in the sequencer table, 113856614 ticks * 8 ns = ~0.91 seconds. Changing the timestamp of this last event also causes the spontaneous trigger delay to change. It means that when commit is performed, it sometimes makes sequencer to run immediately after the ongoing sequence is finished. If commit is performed again during the second run, it may cause another spontaneous run and so on.

It was found in code that when in Single mode, sequencer is not disabled before Commit is applied. When proposed fix is in place, the first script output is as follows:

10.189346033 10.189346034 ...

i.e. event 25 arrives every ~10 seconds, as expected, and spontaneous sequencer runs are not observed. It is proposed to disable sequencer when needed before applying the commit.

mdavidsaver commented 8 years ago

From a quick look I think I agree with this change. I'll take a closer look tomorrow.

The CI build failure is tangential. It looks like evrMrmApp/src/drvemIocsh.cpp needs #include <errlog.h>.

mdavidsaver commented 8 years ago

... it sometimes makes sequencer to run immediately after the ongoing sequence is finished ...

I seem to remember something about the sequencer looping if no end of sequence event is encountered.

aderbenev commented 8 years ago

As far as my tests show, there is no difference in EVG debug output regardless of glitch appearing or not, e.g. evgSoftSeq::process_eos() is called every time a commit is attempted, however it may not be equal to the end of sequence event actually be present in the table. I wonder, what is the maximum timestamp for a sequence event, perhaps whatever fits in uint64? Also, there is code in evgSoftSeq::commitSoftSeq() aimed to append the end of sequence event 0x7f if it is not already present, from which I would expect EVG to never receive an incomplete table.

aderbenev commented 8 years ago

The second commit to my fork was intended to deal with the issue that TS clock can not be set higher than the event clock. It is needed in case if reference frequency exceeds EVR internal synthesizer frequency which can not be fine-tuned because of its specific granularity. This discrepancy causes more event ticks to happen than EVR expects before TS reset event arrives, which yields invalid nanoseconds value (>10^9).

The solution proposed is to allow TS clock to be higher than event clock; if set to exactly match the external reference frequency, it will produce a correct divider to obtain timestamps. A tolerance of 1% which I set is ambiguous; perhaps it better be equal to the tolerance of the event clock setting relative to external frequency, which I do not know.

mdavidsaver commented 8 years ago

From a quick look I think I agree with this change. I'll take a closer look tomorrow.

Ok, so not tomorrow, but the EVG change looks good. disable() should certainly be called after finishSync() as the later can re-enable. In fact I'm now wondering if finishSync() shouldn't call disable() as well to avoid the possibility of writing to a running sequencer.

mdavidsaver commented 8 years ago

which yields invalid nanoseconds value (>10^9).

Is this the source of the "bad" timestamps which nsls2 sees?

aderbenev commented 8 years ago

Yes. My "working" solution to get rid of the issue which I successfully used in one of production EVRs was to set its internal synthesizer frequency to exceed the RF/4 frequency, and it effectively prevented invalid timestamp nanos from appearing for that EVR. This issue can easily be seen by inspecting 1hzCnt-I timestamp.

Later I inspected the code to track down the origin of the divider and figured out that the reason my fix worked is simply because changing event clock also changes TS clock when TS source is "Internal". Since playing with device synthesizer frequency can have (and actually had) unforeseen consequences, it seems reasonable to not touch the hardware setting but instead adjust the divider itself to better match the "true" event frequency. As for now, it is impossible to set TS clock higher than the internal synthesizer clock although it looks like a legit action given that real reference exceeds the latter.

By removing the constraint, I see it as a good way to retain phase lock of external reference and internal synthesizer while giving software a good divider to convert ticks.

mdavidsaver commented 8 years ago

By removing the constraint, I see it as a good way to retain phase

I agree that it is reasonable.

epics-modules / mrfioc2

Disable sequencer before syncing #9