epics-modules / asyn

EPICS module for driver and device support
http://epics-modules.github.io/asyn/
Other
37 stars 75 forks source link

Destructible ports #171

Open exzombie opened 1 year ago

exzombie commented 1 year ago

This PR extends asynManager to allow a port to be destructible, solving #172. It complements epics-base/epics-base#283.

Goals

Port shutdown

There is a new function, asynManager::shutdown(), which

There is a new iocshell command that shuts down a port, useful mostly for testing.

This functionality is opt-in. A conforming driver declares that it is destructible by passing a new attribute, ASYN_DESTRUCTIBLE. In such a case, asynManager will register an epicsAtExit hook to call shutdown() on IOC exit. A conforming driver has to handle asynExceptionShutdown to perform cleanup, and may invalidate itself completely by calling delete.

asynPortDriver and inheritance

asynPortDriver uses the new functionality by

Because this functionality is opt-in, there's a question of backwards compatibility in an inheritance tree, such as those in areaDetector. This can be solved by prescribing that a driver may only receive the ASYN_DESTRUCTIBLE flag and act upon it, but may not force it. This flag should come in via the iocshell command that instantiates a driver. This way, the driver will be destructible only if the most derived class is instantiated with this flag.

Limitations

TODO

AppVeyorBot commented 1 year ago

:x: Build asyn 1.0.161 failed (commit https://github.com/epics-modules/asyn/commit/a796d33d75 by @exzombie)

exzombie commented 1 year ago

I've simplified things a bit since yesterday, and did a smoke test with areaDetector. I noticed that its plugins already have working destructors, so I just forced the NDPosPlugin to declare itself as destructible. valgrind reported no new issues. The base class destructor in asynPortDriver ensures that the port is marked as defunct, so even if a driver is deleted directly, everything should be fine.

@MarkRivers, where do I update the documentation? I noticed that you just added a whole new subdirectory.

AppVeyorBot commented 1 year ago

:x: Build asyn 1.0.176 failed (commit https://github.com/epics-modules/asyn/commit/f56e71eb33 by @exzombie)

AppVeyorBot commented 1 year ago

:white_check_mark: Build asyn 1.0.177 completed (commit https://github.com/epics-modules/asyn/commit/35add117e6 by @exzombie)

MarkRivers commented 1 year ago

where do I update the documentation? I noticed that you just added a whole new subdirectory.

The documentation is now in docs/source/.rst. I have deleted all of the old documentation/.html files that have been converted to .rst. There are still a few small files left to do.

There is a Github Actions that publishes the docs to https://epics-modules.github.io/asyn

AppVeyorBot commented 1 year ago

:white_check_mark: Build asyn 1.0.196 completed (commit https://github.com/epics-modules/asyn/commit/921b11a85d by @exzombie)

exzombie commented 1 year ago

I've rebased, wrote the documentation, and extended the example driver. While writing the docs, I noticed that asynManager::lockPort doesn't go through the request queue as I had thought. Now, lockPort also returns an error.

I also reintroduced the asynPortDriver::shutdown() function. I had removed it because it seemed to me that things would be simpler without it, and I didn't see a good use case. However, I have taken a close look at some of the scars I got in the past when dealing with a design that relied on calling overridden virtual functions from a base class destructor. Not a good idea.

AppVeyorBot commented 1 year ago

:white_check_mark: Build asyn 1.0.214 completed (commit https://github.com/epics-modules/asyn/commit/33edcb38c1 by @exzombie)

AppVeyorBot commented 1 year ago

:white_check_mark: Build asyn 1.0.215 completed (commit https://github.com/epics-modules/asyn/commit/b66b9a3621 by @exzombie)

exzombie commented 1 year ago

Rather than having every driver create an exiting_ flag and set it in the destructor, would it be nicer to provide an asynManager method pasynManager->isExiting(), and/or an asynPortDriver method isExiting()?

Ah, you're referring to the changes I made to the example driver. Well, having an exit flag and having threads act on it is a standard pattern, so it might indeed make sense to provide this flag. Maybe. As I see it, there's two aspects to consider: how such a mechanism would affect the shutdown, and how to provide it in the first place.

Let's suppose there exists an isExiting() function, and that all subclasses use it. Meaning, the threads spawned by each subclass consult this function to decide when to return. And when I say "subclasses", I mean a linear inheritance chain starting with asynPortDriver. It is possible for each subclass to create as many background threads as it wants.

First, setting an "exit" flag may not be enough on its own. Often, the destructor (or shutdown()) has to wake sleeping threads. This is the case for, say, the example driver. And then, the destructor needs to wait for the thread to finish, preferably by joining it. In this context, having isExiting() provided by asynPortDriver base class instead of the subclass is only a minor improvement, I think.

Second, the flag used by isExiting() is set once, and seen by all the threads of all subclasses in parallel. I'm not sure whether this is fine or not. I can imagine a subclass wanting to do a thing or two in shutdown() while its base class is operating normally. This is not possible if isExiting() stops all threads at once.

Which brings us to the question: who sets the exiting flag? I had written a long-ish explanation of how things could work and why most of it is not a good idea, but I scrapped the text because, well, most if it is not a good idea. So let me just say that the only way I see this working well is if shutdown is only allowed to be triggered by asynManager, and a driver may never be deleted directly. Which would mean that all the tests need to be fixed. Luckily, it could be enforced, softly, by printing an error message in asynPortDriver destructor if it executed while the port was still alive.

Is it worth having all threads halt at the same time, and not being able to directly delete drivers, just to avoid having one boolean per subclass?

MarkRivers commented 1 year ago

For this PR I think it would be good to create an Issue that contains the following:

exzombie commented 1 year ago

Sure, I can add some background. But why open an issue? This stuff belongs to the cover letter of the PR, no? Especially the goals and limitations.

prjemian commented 1 year ago

General advice on issues vs. PRs:

The Why? question is the most expensive question that we encounter. An issue provides a direct explanation why a specific software change was initiated.

prjemian commented 1 year ago

Issues describe why, PRs describe how.

exzombie commented 1 year ago

In other words, the "problems" go into the issue, and the "goals" and "limitations" go into the description above. Clear enough. The reason I asked is that, personally, I have no problem if the problem itself is described in the PR, with no corresponding issue ticket. And I was just wondering if you want a separate issue for some functional reasons, or if it was just aesthetics. In any case, I'll open the issue with the "problem", no problem :)

MarkRivers commented 1 year ago

And I was just wondering if you want a separate issue for some functional reasons, or if it was just aesthetics.

Aesthetics, along the lines of what @prjemian said.

exzombie commented 1 year ago

@ericonr Thank you very much for the review!

AppVeyorBot commented 1 year ago

:white_check_mark: Build asyn 1.0.222 completed (commit https://github.com/epics-modules/asyn/commit/39e3cc2b48 by @exzombie)

MarkRivers commented 1 year ago

@anjohnson, @mdavidsaver, and @ralphlange could you please look at this? I would like to merge into the next asyn release if you are good with it. Thanks.

ralphlange commented 1 year ago

Looks good and appropriate to me.

mdavidsaver commented 1 year ago

What is the story for concurrent access to the new defunct and shutdownNeeded flags?

I have almost convinced myself that all changes are made under the port lock, but I think some reads are not guarded.

eg. In lockPort(), the defunct flag is tested before any locking.

Maybe it should be tested after locking synchronousLock?

eg. The added doc. comment for asynPortDriver shows a sub-class destructor calling needsShutdown() which does no locking. Though it isn't clear to me if this test is actually needed vs. unconditionally calling asynPortDriver::shutdown().

exzombie commented 1 year ago

eg. In lockPort(), the defunct flag is tested before any locking. Maybe it should be tested after locking synchronousLock?

You're right, will fix.

eg. The added doc. comment for asynPortDriver shows a sub-class destructor calling needsShutdown() which does no locking. Though it isn't clear to me if this test is actually needed vs. unconditionally calling asynPortDriver::shutdown().

Excellent point. The answer is "yes, it's needed". However, the fact that it wasn't clear to you, and that I myself got a bit confused when trying to explain how shutdown() can be called from different places, means that subclass authors will get this wrong. Thanks for pointing this out. At the very least, I need to update the documentation.

I was hesitant to allow asynPortDriver to be deleted manually (like, in tests) exactly because it causes complications. I will think about this aspect again. However, having three mutexes involved gives me a headache, so it might take a while.

AppVeyorBot commented 1 year ago

:x: Build asyn 1.0.237 failed (commit https://github.com/epics-modules/asyn/commit/3d818ba8f2 by @exzombie)

ericonr commented 1 year ago

Some commit messages have typos:

Prettify the definition if iocsh usage messages Change comment style in asynManger.c for consistency

I will try and test this soon, though I don't have as much of a need for port destruction anymore.

mdavidsaver commented 1 year ago

... However, having three mutexes involved gives me a headache, so it might take a while.

Not a request, only a thought. I've taken to using comments within struct/class definitions to document member variable locking/access rules. eg. // the following guarded by 'mymutex' or // const after constructor or // only access from worker thread. This helps me to remember my intent. And encourages me to have an intent! Whether this intent gets correctly expressed in code is another question.

eg. https://github.com/mdavidsaver/pvxs/blob/8ea613cb078579076244c654a1677eff6e143c30/src/clientmon.cpp#L64-L74

AppVeyorBot commented 1 year ago

:white_check_mark: Build asyn 1.0.237 completed (commit https://github.com/epics-modules/asyn/commit/b106f3e813 by @exzombie)

exzombie commented 1 year ago

I've thought about how to make things more straightforward for users, and decided that it is simpler to just forbid deleting a destructible driver without calling shutdown() first. If the user does that, they will get an error message; the port will still be marked as defunct so most problems are avoided. I think that whoever marks a port as destructible is perfectly capable of calling shutdown() in their test code, while on the other hand, ensuring all the classes in a hierarchy call shutdown() properly when destroyed is less straightforward.

I also added a note explaining that the shutdown() function is called with the driver unlocked. This function has to do it's own locking and unlocking because that's the only way to ensure that threads can make progress and eventually terminate. The shutdownNeeded flag is now atomic, so that it doesn't matter whether the user remembers to lock the driver before calling needsShutdown().

The "template" given in the doxygen comment contained a hint that the ASYN_DESTRUCTIBLE flag should be forwarded in the constructors. I removed that, it's a transitional thing. The intent was to provide a guideline for gradually porting a class hierarchy such as the one in areaDetector, but I think it's better if we keep the asyn docs clean and find agreement on how to do that separately.

This stuff is in 366f154ee74f94bfbdc99fd84d9c2a52db45d8be.

AppVeyorBot commented 1 year ago

:x: Build asyn 1.0.239 failed (commit https://github.com/epics-modules/asyn/commit/925271f8ee by @exzombie)

ericonr commented 1 year ago

I've thought about how to make things more straightforward for users, and decided that it is simpler to just forbid deleting a destructible driver without calling shutdown() first. If the user does that, they will get an error message; the port will still be marked as defunct so most problems are avoided. I think that whoever marks a port as destructible is perfectly capable of calling shutdown() in their test code, while on the other hand, ensuring all the classes in a hierarchy call shutdown() properly when destroyed is less straightforward.

I don't know if I agree with this, because it makes it harder to define a constructor for a class which inherits from asynPortDriver and that can have an exception in the middle of it and simply allow that exception to propagate so the port is automatically destroyed during construction (basically, I'd like to have fallible constructors).

That said, the current code doesn't seem to work for this purpose yet. I've tested a few scenarios in my constructor, which calls asynPortDriver::asynPortDriver() with ASYN_DESTRUCTIBLE.

The issue seems to be that the mechanism was implemented considering first destroying ports after iocInit, I'd assume?

exzombie commented 1 year ago

Ah, you are quite correct, I had not considered this use case, my priority is IOC shutdown. That said, I can definitely see the appeal of delegating handling a dysfunctional driver to asyn, instead of having to carefully return errors from a partially-initialized driver. Thanks for pointing it out; it's in general inevitable that someone finds a use case that wasn't considered, and I'm glad you did it before the change was merged.

A user of the port needs to take several steps in order to obtain a useful handle, and I'll need to think about when it's best to return an error. For example, it may well be that connectDevice() is too early as it would neuter a number of iocsh commands. I'll take a closer look at this stuff.

As for separating shutdown() from the destructor: the alternative is to require that all derived classes call shutdown() in their destructors. The reason is that it has to be called by the most derived class. I see no way to detect missing calls. But I also don't see how this affects your case of incomplete construction. Remember that, when you throw an exception from the constructor, the destructor of the class under construction will not be called, only the destructors of the base classes. So you need to manually ensure that you have cleaned up before letting the exception propagate. Specifically, you need to decide whether the class has been constructed far enough so that it's safe to call its own shutdown(), or if it's better to call the shutdown() of the base class. Having shutdown() called from destructors wouldn't change that.

exzombie commented 1 year ago

@ericonr please test the latest code to see if it satisfies you use case. When you throw an exception from the constructor, what you're doing is pretty much equivalent to calling delete *this, so you need to call shutdown() of the base class before you throw. Or shutdown() of the class under construction, if appropriate.

The initalization use case is taken care of by denying access to findInterface() and interrupt related functions on a destroyed port. In addition, the interfaces of asynPortDriver are left around after the port is destroyed, and the read and write functions of asynPortDriver now check for a null driver pointer. This is intended to catch most of the users of the interfaces even if they obtained them before the port was shut down.

Of course, in C, you can never guarantee that someone didn't store a reference they shouldn't have and is now dangling, but I don't see how we can do better than this.

While going down this rabbit hole, I added locks for asynManagerLock in places where it was absent before (like asynManager::enable()). This mutex is supposed to protect the enabled flag (and, by extension, the defunct flag`), but it was not always locked. @MarkRivers @mdavidsaver I think this deserves another pair of eyes. IIUC, at most one of manager lock, synchronous lock, and the driver lock should taken at any given moment. This would ensure that there are no deadlocks, but I find it hard to convince myself that this is the case. I did manage to convince myself in the end, but, given that it's really hard to test this kind of thing, I think it deserves another look.

AppVeyorBot commented 1 year ago

:x: Build asyn 1.0.244 failed (commit https://github.com/epics-modules/asyn/commit/cf8e5af428 by @exzombie)

ericonr commented 5 months ago

@exzombie since you mentioned locking... I found a deadlock when throwing an exception from the constructor, which I explain and fix here: https://github.com/lnls-dig/asyn/commit/772c2993a8cbe4491f069cfe80d1ed1be34adf53 (feel free to import that into the PR)

Once that was applied, throwing from the constructor with or without calling shutdown() seemed to work fine :) (I don't overload shutdown(); either way, forcing users to call shutdown() seems reasonable, since checking dynamically if the derived class overloads a function isn't really doable)

EDIT: the issue might be further down, in epics-base threading code... epicsThread::exitWait() (and ~epicsThread(), which calls it) are supposed to be callable in the context of the running thread (though I'm not sure how exactly that works...). If the thread hasn't been joined yet, epicsThreadMustJoin(this->id); is called, which is the deadlock point. I don't know if we are missing something setting epicsThread::joined, or some other detail. I thought the fact I was using musl wasn't relevant, but it probably is...

EDIT+: I took a look at osdThread.c for POSIX, and indeed, I think I should try and fix stuff there rather than in asyn. calling exitWait() (directly or via ~epicsThread()) in the context of the running thread should work, because it will detach the thread.

AppVeyorBot commented 3 months ago

:x: Build asyn 1.0.267 failed (commit https://github.com/epics-modules/asyn/commit/32ca62eacb by @exzombie)

exzombie commented 3 months ago

I encountered drivers that access the asynStdInterfaces member variable directly instead of through getAsynStdInterfaces(), which means that changing it to a pointer breaks things. To maintain API compatibility, I added a reference member variable of the same name; the pointer needed.

I also noticed that a lot of drivers have a shutdown() function. No surprise there, but there's more of them than I expected. I thus renamed the new function in the base class to shutdownPortDriver() to avoid breaking existing drivers.

AppVeyorBot commented 3 months ago

:x: Build asyn 1.0.268 failed (commit https://github.com/epics-modules/asyn/commit/4098fe0f52 by @exzombie)

AppVeyorBot commented 3 months ago

:white_check_mark: Build asyn 1.0.269 completed (commit https://github.com/epics-modules/asyn/commit/7015fe467b by @exzombie)