bluesky / ophyd

hardware abstraction in Python with an emphasis on EPICS
https://blueskyproject.io/ophyd
BSD 3-Clause "New" or "Revised" License
50 stars 79 forks source link

Do not start threads on import. #876

Open danielballan opened 4 years ago

danielballan commented 4 years ago

This is less than ideal:

>>> threading.active_count()
1
>>> import ophyd.sim
>>> threading.active_count()
11

I guess this is due to the various simulated signals that update on a background thread. It is worth considering an API breaking change to move these out of global scope (only create them when hw() is called) and/or add a special method to explicitly start their background thread.


Edit: As @dmgav points out below, only two of these threads are actually due to ophyd.sim. It seems the rest are related to starting the control layer, with its various background threads for handling callbacks and so on. So the leakage isn't as bad as I originally feared, but it's still worth fixing.

dmgav commented 4 years ago

At the first glance I have 17 new threads started once I run import ophyd.sim: 15 of those threads are started by https://github.com/bluesky/ophyd/blob/b572a4caa2d0be29cf74f60218e684c62b0dac04/ophyd/__init__.py#L72 and the other 2 by https://github.com/bluesky/ophyd/blob/b572a4caa2d0be29cf74f60218e684c62b0dac04/ophyd/sim.py#L1434

danielballan commented 4 years ago

The two threads from ophyd.sim itself generate INFO-level log messages periodically, such that logging.basicConfig() gets very noisy. (A downstream user reported this. We tend not to use basicConfig() because it's a blunt instrument, but nonetheless I don't think it would be reasonable to say, "Just don't use logging.basicConfig().") The original motivation for this issue was to resolve that. But now I'm also curious why we see different numbers of threads.

What output do you get for this, @dmgav?

>>> import threading
>>> [t.name for t in threading.enumerate()]
['MainThread']
>>> import ophyd
^[[A[t.name for t in threading.enumerate()]
['MainThread', 'metadata', 'monitor', 'get_put', 'util0', 'util1', 'util2', 'util3', 'debug_monitor']
>>> import ophyd.sim
>>> [t.name for t in threading.enumerate()]
['MainThread', 'metadata', 'monitor', 'get_put', 'util0', 'util1', 'util2', 'util3', 'debug_monitor', 'Thread-9', 'Thread-10']
dmgav commented 4 years ago
In [1]: import threading

In [2]: [t.name for t in threading.enumerate()]
Out[2]: ['MainThread', 'IPythonHistorySavingThread']

In [3]: import ophyd

In [4]: [t.name for t in threading.enumerate()]
Out[4]: 
['MainThread',
 'IPythonHistorySavingThread',
 'selector',
 'command',
 'check_for_unresponsive_servers',
 'retry',
 'search',
 'activate_subscriptions',
 'selector',
 'metadata',
 'monitor',
 'get_put',
 'util0',
 'util1',
 'util2',
 'util3',
 'debug_monitor']

In [5]: import ophyd.sim

In [6]: [t.name for t in threading.enumerate()]
Out[6]: 
['MainThread',
 'IPythonHistorySavingThread',
 'selector',
 'command',
 'check_for_unresponsive_servers',
 'retry',
 'search',
 'activate_subscriptions',
 'selector',
 'metadata',
 'monitor',
 'get_put',
 'util0',
 'util1',
 'util2',
 'util3',
 'debug_monitor',
 'Thread-246',
 'Thread-247']
danielballan commented 4 years ago

Now I understand. In my environment, I happen to have pyepics installed, so ophyd creates a pyepics control layer. You do not have pyepics but you do have caproto, so ophyd falls back to that. A caproto context has more Python threads than a pyepics one does. (Pyepics has C threads, via libca, to manage these things.)

It’s unfortunate that import ophyd unavaoidably starts a bunch of threads. Data Broker imports ophyd to get access to the handler that it ships, and as a side effect starts 10 or 17 threads. Can we defer that somehow without making things too complicated?

Deferring the ones in ophyd.sim seems to require a small breaking change, as mentioned above: either omitting the two relevant instances from the global namespace, or requiring a new method to explicitly start their thread.

Deferring the ones in the control layer sounds harder, and might better be considered in a separate PR. (It’s possible that we should not change that.)

dmgav commented 4 years ago

The objects that are causing trouble are rand and rand2 (class SynPeriodicSignal). One of the ways to deal with this is to add a new 'start' method in SynPeriodicSignal that will start the thread (and start the value updates). But it will require updating all the code that is using rand, rand2 and SynPeriodicSignal.

The other way is to override methods of Signal class in SynPeriodicSignal (trigger, get, put, set, read, value properties to first check if the thread exists and create the thread if needed).

danielballan commented 4 years ago

Good thought! Could we do both? That is, provide a public method but automatically call that public method when one of those methods(get, put, ...) is called? I think we'll have to include subscribe as well.

It might be good to give the method a name that makes it clear to the reader that this is only necessary because it is a simulated detector, like start_simulation().

danielballan commented 4 years ago

Two of the threads were addressed by #877. The others are related to starting a context in the control layer at import time. Can we instead start these threads the first time the control layer is actually used?