Closed prjemian closed 4 years ago
requirements.txt
file:
apstools
bluesky
databroker>=1.0.0b
epics-base
hklpy
ipython
ophyd
pydm
pyepics
pygobject
pymongo
pyRestTable
python>=3
spec2nexus>=2017.1
stdlogpj
install it:
export CHANNELS="-c defaults -c conda-forge -c nsls2forge -c aps-anl-tag -c pydm-tag"
conda create -n db-test -y $CHANNELS --file=requirements.txt
then test it:
(base) mintadmin@mint-vm:/tmp/bs-install$ conda activate db-test
(db-test) mintadmin@mint-vm:/tmp/bs-install$ ipython
Python 3.7.5 (default, Oct 25 2019, 15:51:11)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import databroker
In [2]: exit
Note that jupyter notebook is not included (nor is tornado). The pin on tornado<5 is incompatible with new requirements in databroker (where dask needs a more recent tornado). Since the tornado pin is needed for jupyter notbooks and the notebook is not needed to test the new databroker, that had to go.
@mrakitin confirmed on the Nikea slack channel:
The tornado <5 conflicts with intake->dask->tornado>=5
@danielballan : first steps:
'versions': {'bluesky': '1.5.5',
'ophyd': '1.3.3',
'databroker': '1.0.0b3',
'apstools': '1.1.14',
'epics': '3.4.0',
'numpy': '1.17.3',
'matplotlib': '3.1.2',
'spec2nexus': '2021.1.7',
'pyRestTable': '2020.0.2'},
In [19]: db.v2
Out[19]: <Intake catalog: mongodb_config>
In [20]: db2 = db.v2
In [21]: db2[-1]
Out[21]:
Run Catalog
uid='78d6eb83-078c-4255-88d7-119c4f8b4931'
exit_status='success'
2019-11-21 21:59:04.777 -- 2019-11-21 21:59:09.223
Streams:
* primary
In [22]: h = _
In [23]: h.metadata
Out[23]:
{'start': {'uid': '78d6eb83-078c-4255-88d7-119c4f8b4931',
'time': 1574395144.7773948,
'scan_id': 436,
'beamline_id': 'Linux Mint VM7',
'proposal_id': 'testing',
'pid': 27381,
'login_id': 'mintadmin@mint-vm',
'versions': {'bluesky': '1.5.5',
'ophyd': '1.3.3',
'databroker': '1.0.0b3',
'apstools': '1.1.14',
'epics': '3.4.0',
'numpy': '1.17.3',
'matplotlib': '3.1.2',
'spec2nexus': '2021.1.7',
'pyRestTable': '2020.0.2'},
'plan_type': 'generator',
'plan_name': 'rel_scan',
'detectors': ['noisy'],
'motors': ['m1'],
'num_points': 21,
'num_intervals': 20,
'plan_args': {'detectors': ["EpicsSignalRO(read_pv='sky:userCalc1', name='noisy', value=90542.21638390009, timestamp=1574395135.167838, pv_kw={}, auto_monitor=False, string=False)"],
'num': 21,
'args': ["EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])",
-0.11974493237810635,
0.11974493237810635],
'per_step': 'None'},
'hints': {'dimensions': [[['m1'], 'primary']]},
'plan_pattern': 'inner_product',
'plan_pattern_module': 'bluesky.plan_patterns',
'plan_pattern_args': {'num': 21,
'args': ["EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])",
-0.11974493237810635,
0.11974493237810635]}},
'stop': {'run_start': '78d6eb83-078c-4255-88d7-119c4f8b4931',
'time': 1574395149.223173,
'uid': '5dd8500b-c9b6-4c34-be11-e8baf4bde465',
'exit_status': 'success',
'reason': '',
'num_events': {'primary': 21}},
'catalog_dir': None}
Suggestions for next steps?
Great. Next steps:
h.<TAB> # shows available stream names (among other things)
list(h) # lists the stream names
h.primary # access one such stream
# Read the data from all the Events and EventDescriptors in the stream
# into an object convenient for interactive scripting, vis, and analysis.
# This may be very expensive if you have large image data referenced in the Events,
# but at APS you probably do not have that yet.
# (Read on for a cheaper alternative...)
ds = h.primary.read() # xarray.Dataset backed by normal numpy arrays
ds['SOME_COLUMN_SHOWN_IN_THE_OUTPUT_FROM_ABOVE']
ds['SOME_COLUMN_SHOWN_IN_THE_OUTPUT_FROM_ABOVE'].plot()
ds['SOME_COLUMN_SHOWN_IN_THE_OUTPUT_FROM_ABOVE'].sum()
ds['SOME_COLUMN_SHOWN_IN_THE_OUTPUT_FROM_ABOVE'].data # direct access to the underlying numpy array
# This is always pretty fast, because most of the I/O is deferred.
ds = h.primary.dask() # xarray.Dataset backed by lazy dask arrays
ds['SOME_COLUMN_SHOWN_IN_THE_OUTPUT_FROM_ABOVE']
# All the same operations above work. The difference between dask and numpy is mostly transparent.
Ok, that's progress:
db2 = db.v2
h = db2[-1]
ds.primary.read()
ds.keys()
ds["noisy"].plot()
as shown:
Problem comes up with the next command, involving dask:
In [9]: ds = h.primary.dask()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-a369c71f95a4> in <module>
----> 1 ds = h.primary.dask()
~/Apps/anaconda/envs/db-test/lib/python3.7/site-packages/intake/catalog/entry.py in __getattr__(self, attr)
119 return self.__dict__[attr]
120 else:
--> 121 return getattr(self._get_default_source(), attr)
122
123 def __dir__(self):
AttributeError: 'BlueskyEventStream' object has no attribute 'dask'
In [10]: ! conda list dask
# packages in environment at /home/mintadmin/Apps/anaconda/envs/db-test:
#
# Name Version Build Channel
dask 2.8.0 py_1
dask-core 2.8.0 py_0
Note this information about the software versions in place when the data was collected:
In [21]: h.metadata["start"]["versions"]
Out[21]:
{'bluesky': '1.5.5',
'ophyd': '1.3.3',
'databroker': '1.0.0b3',
'apstools': '1.1.14',
'epics': '3.4.0',
'numpy': '1.17.3',
'matplotlib': '3.1.2',
'spec2nexus': '2021.1.7',
'pyRestTable': '2020.0.2'}
@danielballan : You suggested h.primary.dask()
, yet:
In [22]: hasattr(h.primary, "dask")
Out[22]: False
What have I done wrong here?
Answered me own question (typo: dask()
--> to_dask()
):
In [31]: ds = h.primary.to_dask()
In [32]: ds['noisy']
Out[32]:
<xarray.DataArray 'noisy' (time: 21)>
dask.array<stack, shape=(21,), dtype=float64, chunksize=(1,), chunktype=numpy.ndarray>
Coordinates:
* time (time) float64 1.574e+09 1.574e+09 ... 1.574e+09 1.574e+09
These produce the same result:
ds["noisy"].plot()
ds['noisy'].plot()
ds.noisy.plot()
now, a proper plot of y v. x:
Overplot the previous two scans and it becomes clear this was a series of scans to center on a peak at previously unknown position:
this was unexpected:
In [51]: db2[-10:]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-51-bab7771239f3> in <module>
----> 1 db2[-10:]
~/Apps/anaconda/envs/db-test/lib/python3.7/site-packages/intake/catalog/base.py in __getitem__(self, key)
372 cat['name1', 'name2']
373 """
--> 374 if not isinstance(key, list) and key in self._get_entries():
375 # triggers reload_on_change
376 e = self._entries[key]
~/Apps/anaconda/envs/db-test/lib/python3.7/site-packages/databroker/_drivers/mongo_normalized.py in __contains__(self, key)
125 # Avoid iterating through all entries.
126 try:
--> 127 self[key]
128 except KeyError:
129 return False
~/Apps/anaconda/envs/db-test/lib/python3.7/site-packages/databroker/_drivers/mongo_normalized.py in __getitem__(self, name)
67 collection = self.catalog._run_start_collection
68 try:
---> 69 N = int(name)
70 except ValueError:
71 query = {'$and': [self.catalog._query, {'uid': name}]}
TypeError: int() argument must be a string, a bytes-like object or a number, not 'slice'
Working around that problem (can't use slicing to index db.v2, yet), here's the results of finding three different peaks at unknown position:
Here's the basic procedure which generated these scans:
# adjust the peak parameters for "noisy"
# "noisy" is a simulated noisy lorentzian function based on m1.user_readback
# search for peak
RE(bp.scan([noisy], m1, -2, 2, 21))
mov m1 bec.peaks["cen"]["noisy"]
fwhm = bec.peaks["fwhm"]["noisy"]
# rescan near center
RE(bp.rel_scan([noisy], m1, -fwhm, fwhm, 21))
mov m1 bec.peaks["cen"]["noisy"]
fwhm = bec.peaks["fwhm"]["noisy"]
# rescan again near center (with more-realistic FWHM range)
RE(bp.rel_scan([noisy], m1, -fwhm, fwhm, 21))
mov m1 bec.peaks["cen"]["noisy"]
Working on a lineup()
plan that encapsulates this procedure for APS 8-ID-I.
The lineup()
plan in development uses two passes, not three. It also (or should) stores the PeakStats results.
Follow-up (offline) from @tacaswell: We can use new databroker with notebook as long as we do not add bluesky. The collision (and reason for the pin of tornado<5) is in how bluesky and tornado work with asyncio. For analysis, there is no need to load bluesky (or ophyd). To manage the installations, use separate conda environments for data acquisition and data analysis.
there is a new (beta release) of the databroker to be tested