explore new databroker version

prjemian commented 4 years ago

there is a new (beta release) of the databroker to be tested

prjemian commented 4 years ago

requirements.txt file:

apstools
bluesky
databroker>=1.0.0b
epics-base
hklpy
ipython
ophyd
pydm
pyepics
pygobject
pymongo
pyRestTable
python>=3
spec2nexus>=2017.1
stdlogpj

prjemian commented 4 years ago

install it:

export CHANNELS="-c defaults -c conda-forge -c nsls2forge -c aps-anl-tag -c pydm-tag"
conda create -n db-test  -y $CHANNELS --file=requirements.txt

then test it:

(base) mintadmin@mint-vm:/tmp/bs-install$ conda activate db-test
(db-test) mintadmin@mint-vm:/tmp/bs-install$ ipython
Python 3.7.5 (default, Oct 25 2019, 15:51:11) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import databroker                                                                                                                   

In [2]: exit

prjemian commented 4 years ago

Note that jupyter notebook is not included (nor is tornado). The pin on tornado<5 is incompatible with new requirements in databroker (where dask needs a more recent tornado). Since the tornado pin is needed for jupyter notbooks and the notebook is not needed to test the new databroker, that had to go.

prjemian commented 4 years ago

@mrakitin confirmed on the Nikea slack channel:

The tornado <5 conflicts with intake->dask->tornado>=5

prjemian commented 4 years ago

@danielballan : first steps:

using existing setup on development system
new db-test environment
ran a few scans
simple db exploration

  'versions': {'bluesky': '1.5.5',
   'ophyd': '1.3.3',
   'databroker': '1.0.0b3',
   'apstools': '1.1.14',
   'epics': '3.4.0',
   'numpy': '1.17.3',
   'matplotlib': '3.1.2',
   'spec2nexus': '2021.1.7',
   'pyRestTable': '2020.0.2'},

In [19]: db.v2                                                                                                                              
Out[19]: <Intake catalog: mongodb_config>

In [20]: db2 = db.v2                                                                                                                        

In [21]: db2[-1]                                                                                                                            
Out[21]: 
Run Catalog
  uid='78d6eb83-078c-4255-88d7-119c4f8b4931'
  exit_status='success'
  2019-11-21 21:59:04.777 -- 2019-11-21 21:59:09.223
  Streams:
    * primary

In [22]: h = _                                                                                                                              

In [23]: h.metadata                                                                                                                         
Out[23]: 
{'start': {'uid': '78d6eb83-078c-4255-88d7-119c4f8b4931',
  'time': 1574395144.7773948,
  'scan_id': 436,
  'beamline_id': 'Linux Mint VM7',
  'proposal_id': 'testing',
  'pid': 27381,
  'login_id': 'mintadmin@mint-vm',
  'versions': {'bluesky': '1.5.5',
   'ophyd': '1.3.3',
   'databroker': '1.0.0b3',
   'apstools': '1.1.14',
   'epics': '3.4.0',
   'numpy': '1.17.3',
   'matplotlib': '3.1.2',
   'spec2nexus': '2021.1.7',
   'pyRestTable': '2020.0.2'},
  'plan_type': 'generator',
  'plan_name': 'rel_scan',
  'detectors': ['noisy'],
  'motors': ['m1'],
  'num_points': 21,
  'num_intervals': 20,
  'plan_args': {'detectors': ["EpicsSignalRO(read_pv='sky:userCalc1', name='noisy', value=90542.21638390009, timestamp=1574395135.167838, pv_kw={}, auto_monitor=False, string=False)"],
   'num': 21,
   'args': ["EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])",
    -0.11974493237810635,
    0.11974493237810635],
   'per_step': 'None'},
  'hints': {'dimensions': [[['m1'], 'primary']]},
  'plan_pattern': 'inner_product',
  'plan_pattern_module': 'bluesky.plan_patterns',
  'plan_pattern_args': {'num': 21,
   'args': ["EpicsMotor(prefix='sky:m1', name='m1', settle_time=0.0, timeout=None, read_attrs=['user_readback', 'user_setpoint'], configuration_attrs=['user_offset', 'user_offset_dir', 'velocity', 'acceleration', 'motor_egu'])",
    -0.11974493237810635,
    0.11974493237810635]}},
 'stop': {'run_start': '78d6eb83-078c-4255-88d7-119c4f8b4931',
  'time': 1574395149.223173,
  'uid': '5dd8500b-c9b6-4c34-be11-e8baf4bde465',
  'exit_status': 'success',
  'reason': '',
  'num_events': {'primary': 21}},
 'catalog_dir': None}

Suggestions for next steps?

danielballan commented 4 years ago

Great. Next steps:


h.<TAB>  # shows available stream names (among other things)
list(h)  # lists the stream names
h.primary  # access one such stream

# Read the data from all the Events and EventDescriptors in the stream
# into an object convenient for interactive scripting, vis, and analysis.
# This may be very expensive if you have large image data referenced in the Events,
# but at APS you probably do not have that yet.
# (Read on for a cheaper alternative...)
ds = h.primary.read()  # xarray.Dataset backed by normal numpy arrays
ds['SOME_COLUMN_SHOWN_IN_THE_OUTPUT_FROM_ABOVE']
ds['SOME_COLUMN_SHOWN_IN_THE_OUTPUT_FROM_ABOVE'].plot()
ds['SOME_COLUMN_SHOWN_IN_THE_OUTPUT_FROM_ABOVE'].sum()
ds['SOME_COLUMN_SHOWN_IN_THE_OUTPUT_FROM_ABOVE'].data  # direct access to the underlying numpy array

# This is always pretty fast, because most of the I/O is deferred.
ds = h.primary.dask()  # xarray.Dataset backed by lazy dask arrays
ds['SOME_COLUMN_SHOWN_IN_THE_OUTPUT_FROM_ABOVE']
# All the same operations above work. The difference between dask and numpy is mostly transparent.

prjemian commented 4 years ago

Ok, that's progress:

db2 = db.v2
h = db2[-1]
ds.primary.read()
ds.keys()
ds["noisy"].plot()

as shown: Clipboard01

prjemian commented 4 years ago

Problem comes up with the next command, involving dask:

In [9]: ds = h.primary.dask()                                                                                                               
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-a369c71f95a4> in <module>
----> 1 ds = h.primary.dask()

~/Apps/anaconda/envs/db-test/lib/python3.7/site-packages/intake/catalog/entry.py in __getattr__(self, attr)
    119             return self.__dict__[attr]
    120         else:
--> 121             return getattr(self._get_default_source(), attr)
    122 
    123     def __dir__(self):

AttributeError: 'BlueskyEventStream' object has no attribute 'dask'

In [10]: ! conda list dask                                                                                                                  
# packages in environment at /home/mintadmin/Apps/anaconda/envs/db-test:
#
# Name                    Version                   Build  Channel
dask                      2.8.0                      py_1  
dask-core                 2.8.0                      py_0

Note this information about the software versions in place when the data was collected:

In [21]: h.metadata["start"]["versions"]                                                                                                    
Out[21]: 
{'bluesky': '1.5.5',
 'ophyd': '1.3.3',
 'databroker': '1.0.0b3',
 'apstools': '1.1.14',
 'epics': '3.4.0',
 'numpy': '1.17.3',
 'matplotlib': '3.1.2',
 'spec2nexus': '2021.1.7',
 'pyRestTable': '2020.0.2'}

prjemian commented 4 years ago

@danielballan : You suggested h.primary.dask(), yet:

In [22]: hasattr(h.primary, "dask")                                                                                                         
Out[22]: False

What have I done wrong here?

prjemian commented 4 years ago

Answered me own question (typo: dask() --> to_dask()):

In [31]: ds = h.primary.to_dask()                                                                                                           

In [32]: ds['noisy']                                                                                                                        
Out[32]: 
<xarray.DataArray 'noisy' (time: 21)>
dask.array<stack, shape=(21,), dtype=float64, chunksize=(1,), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) float64 1.574e+09 1.574e+09 ... 1.574e+09 1.574e+09

prjemian commented 4 years ago

These produce the same result:

ds["noisy"].plot()
ds['noisy'].plot()
ds.noisy.plot()

prjemian commented 4 years ago

now, a proper plot of y v. x:

Clipboard01

prjemian commented 4 years ago

Overplot the previous two scans and it becomes clear this was a series of scans to center on a peak at previously unknown position: Clipboard01

prjemian commented 4 years ago

this was unexpected:

In [51]: db2[-10:]                                                                                                                          
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-51-bab7771239f3> in <module>
----> 1 db2[-10:]

~/Apps/anaconda/envs/db-test/lib/python3.7/site-packages/intake/catalog/base.py in __getitem__(self, key)
    372         cat['name1', 'name2']
    373         """
--> 374         if not isinstance(key, list) and key in self._get_entries():
    375             # triggers reload_on_change
    376             e = self._entries[key]

~/Apps/anaconda/envs/db-test/lib/python3.7/site-packages/databroker/_drivers/mongo_normalized.py in __contains__(self, key)
    125         # Avoid iterating through all entries.
    126         try:
--> 127             self[key]
    128         except KeyError:
    129             return False

~/Apps/anaconda/envs/db-test/lib/python3.7/site-packages/databroker/_drivers/mongo_normalized.py in __getitem__(self, name)
     67         collection = self.catalog._run_start_collection
     68         try:
---> 69             N = int(name)
     70         except ValueError:
     71             query = {'$and': [self.catalog._query, {'uid': name}]}

TypeError: int() argument must be a string, a bytes-like object or a number, not 'slice'

prjemian commented 4 years ago

Working around that problem (can't use slicing to index db.v2, yet), here's the results of finding three different peaks at unknown position: Clipboard01

prjemian commented 4 years ago

Here's the basic procedure which generated these scans:

# adjust the peak parameters for "noisy"
# "noisy" is a simulated noisy lorentzian function based on m1.user_readback

# search for peak
RE(bp.scan([noisy], m1, -2, 2, 21))
mov m1 bec.peaks["cen"]["noisy"]
fwhm = bec.peaks["fwhm"]["noisy"]

# rescan near center
RE(bp.rel_scan([noisy], m1, -fwhm, fwhm, 21)) 
mov m1 bec.peaks["cen"]["noisy"]
fwhm = bec.peaks["fwhm"]["noisy"]

# rescan again near center (with more-realistic FWHM range)
RE(bp.rel_scan([noisy], m1, -fwhm, fwhm, 21)) 
mov m1 bec.peaks["cen"]["noisy"]

prjemian commented 4 years ago

Working on a lineup() plan that encapsulates this procedure for APS 8-ID-I.

prjemian commented 4 years ago

The lineup() plan in development uses two passes, not three. It also (or should) stores the PeakStats results.

prjemian commented 4 years ago

Follow-up (offline) from @tacaswell: We can use new databroker with notebook as long as we do not add bluesky. The collision (and reason for the pin of tornado<5) is in how bluesky and tornado work with asyncio. For analysis, there is no need to load bluesky (or ophyd). To manage the installations, use separate conda environments for data acquisition and data analysis.

BCDA-APS / use_bluesky

explore new databroker version #42