mbbOut on_update no longer triggers properly

AlexanderWells-diamond commented 2 years ago

As of version 4.0.0, the on_update method of an mbbOut record no longer triggers as expected - it fires once, often several seconds after a caput, and then never fires again for subsequent updates. This is true even with always_update=True. Code to recreate:

from softioc import softioc, builder, asyncio_dispatcher

dispatcher = asyncio_dispatcher.AsyncioDispatcher()

builder.SetDeviceName("ABC")

def update(new_val: int):
    print("New MBO value:", new_val)

mbo = builder.mbbOut("MBO", "A", "B", "C", "D", initial_value=0, on_update=update, always_update=True)

builder.LoadDatabase()
softioc.iocInit(dispatcher)
softioc.interactive_ioc(globals())

Then run these commands on a separate terminal:

caput("ABC:MBO", 1)
caput("ABC:MBO", 2)
caput("ABC:MBO", 3)

In my testing, using both the cothread.catools and 4p.client.cothread.Context equivalent methods, shows just a single print statement for the first caput, and none for the remaining ones. Notably, caget does return the expected value at each point, its just that the on_update method only fires once.

Note I haven't tested this for all Out record types, only for mbb records.

This was discovered while trying to update this PR to use pythonSoftIOC v4.0.0, where this issue causes a test failure: https://github.com/PandABlocks/PandABlocks-client/pull/28

thomascobb commented 2 years ago

This doesn't look right:

$ caget ABC:MBO
ABC:MBO                        [\'A\', \'B\', \'C\', \'D\']

Investigating...

thomascobb commented 2 years ago

I can't reproduce, that code seems to work for me, although as soon as I put it in a file in my project, VSCode ran an auto-test discovery and ran the IOC up when it imported. This meant that I would have 2 IOCs running. Might that have happened to you?

thomascobb commented 2 years ago

Also, it should be

mbo = builder.mbbOut("MBO", "A", "B", "C", "D",...)

to avoid stuffing an odd string into ZRST. Although that seems unrelated...

AlexanderWells-diamond commented 2 years ago

I can't say I understand why a test discovery would start the IOC. That would imply the scanner is actively running code? Either way, I don't see the same thing happening in my VSCode. I replace the "example_asyncio.ioc.py" file, which isn't scanned for tests.

I've edited the original post with the correct syntax for listing labels. It makes no difference to my tests.

AlexanderWells-diamond commented 2 years ago

So a couple more hours investigating has shown a few things:

This issue does NOT appear on my Diamond workstation, everything works as expected there
This DOES appear on Github CI runs, for example here: https://github.com/PandABlocks/PandABlocks-client/runs/4950832178?check_suite_focus=true
This issue DOES appear when I run that same failing test on my Diamond workstations AND at home

So there's some issue with this recreate that only seems to appear on my machine, but I have a test that seems to always exhibit this issue when run anywhere. Its pretty easy to verify - adding a print statement to the method and the message is only ever printed once.

Araneidae commented 2 years ago

I can reproduce the issue on my workstation, and it looks fairly fundamental. Will dig deeper.

Please delete release 4.0.0 from github and PyPi!

thomascobb commented 2 years ago

Github release and PyPI release have been deleted. I'll leave @AlexanderWells-diamond to delete the branch and tag

Araneidae commented 2 years ago

I've deleted the tag, leave the branch alone, I'll push my fix to it, and we can do a simple merge when done.

Araneidae commented 2 years ago

Sorry, have been a bit heavy handed: have done a force push setting master to the state of my sort_setting_values branch.

AlexanderWells-diamond commented 2 years ago

Unfortunately that push has deleted some commits. I'd done some changes on master since that merge. I've got them in a local checkout but don't have time to try and merge them back in right now.

I peeked at the CI but don't know why it's completely broken. I am of course suspicious that other commits have somehow been lost somewhere.

Araneidae commented 2 years ago

Well, I've looked at the diff between my new master and gitlab/master which you pushed a couple of days ago, and the only difference is some formatting changes (and reference to 4.0.0) in CHANGELOG.rst.

AlexanderWells-diamond commented 2 years ago

I've salvaged the missing commits and re-committed them to master. As you said, there were no code changes made.

Regarding tests, they run and pass fine on several machines when I run them myself, but are apparently still hanging on most Ubuntu test runners, and failing for MacOS. On MacOS it looks like there's some issue terminating the IOCs, which results in cascade failures as I re-use the same IOC names across tests and so the wrong IOCs are being queried. I haven't currently got any idea what may cause apparently infinite hangs in the Ubuntu tests.

Araneidae commented 2 years ago

The underlying problem here seems to be that when I call self._read_value(record) to get the underlying ctypes value ... I get what amounts to a reference to the underlying value ... which means that when I store it away in self._value I'm not storing a static value, I'm storing a reference to the value in underlying EPICS record. Then when the record is processed with a new value and I compare self._value with self._read_value(record) I'm actually comparing the same values!

I think this is what's going on. The simplest fix will be to simply change _read_value() to return a copy of the value. This already happens for waveforms and strings.

thomascobb commented 2 years ago

Is that this code: https://github.com/dls-controls/pythonSoftIOC/blob/91fc8e33b9b4d4bbe6a8a557d68fedc6b61b17fb/softioc/device.py#L166-L167

If so, doesn't it that comparison not fire if always_update is True?

Araneidae commented 2 years ago

Yes ... and I'm not able to reproduce the problem with always_update=True!

Araneidae commented 2 years ago

I hope this is fixed in commit 8125cbda453e66a05f2ff1e7327e88ea5c8f788b, but I've not been able to reproduce the problem with always_update set, so I won't mark this as closed.

AlexanderWells-diamond commented 2 years ago

I can no longer recreate the issue, with or without always_update=True. Tested on my home and Diamond machines.

I do have what I think is a different problem on my home machine though: caput's are incredibly slow - 10 seconds for the first CAput to be processed, 60 for the second, and the third apparently never arrives. I can only assume I have some messed up network connections on this machine, but am unsure how to test it further - the issue happens below the level of the Python code so I can't trace anything.

I would like to see at least one test written for this issue before we close it. There's currently no tests that use on_update (or always_update, which I'm less concerned about but there really should be something...)

AlexanderWells-diamond commented 2 years ago

Tests written in 0b0af28305f2668567aff69b0156a8d61cb35bd8 (and fixed in 0b0af28305f2668567aff69b0156a8d61cb35bd8).

dls-controls / pythonSoftIOC

mbbOut on_update no longer triggers properly #78