Closed ericdill closed 5 years ago
Suggested fix was:
from eiger_io.fs_handler import LazyEigerHandler
from filestore.api import register_handler
register_handler("AD_EIGER", LazyEigerHandler)
@ordirules also asked about the possibility of not exporting the Eiger data into the hdf5 file. That is a great suggestion, but for now I suggested that he could just copy the export
function and comment out
https://github.com/NSLS-II/suitcase/blob/master/suitcase.py#L48.
Now there's a new error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-51-630ca4a4c95a> in <module>()
----> 1 export(cfndb, "/home/lhermitte/test.hd5")
<ipython-input-49-b23001810fc6> in export(headers, filename, savedata)
56 data = [e['data'][key] for e in events]
57 dataset = data_group.create_dataset(
---> 58 key, data=data, compression='gzip', fletcher32=True)
59 # Put contents of this data key (source, etc.)
60 # into an attribute on the associated data set.
/opt/conda_envs/analysis/lib/python3.4/site-packages/h5py/_hl/group.py in create_dataset(self, name, shape, dtype, data, **kwds)
101 """
102 with phil:
--> 103 dsid = dataset.make_new_dset(self, shape, dtype, data, **kwds)
104 dset = dataset.Dataset(dsid)
105 if name is not None:
/opt/conda_envs/analysis/lib/python3.4/site-packages/h5py/_hl/dataset.py in make_new_dset(parent, shape, dtype, data, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times)
85 else:
86 dtype = numpy.dtype(dtype)
---> 87 tid = h5t.py_create(dtype, logical=1)
88
89 # Legacy
h5py/h5t.pyx in h5py.h5t.py_create (/home/ilan/minonda/conda-bld/work/h5py/h5t.c:16162)()
h5py/h5t.pyx in h5py.h5t.py_create (/home/ilan/minonda/conda-bld/work/h5py/h5t.c:15993)()
h5py/h5t.pyx in h5py.h5t.py_create (/home/ilan/minonda/conda-bld/work/h5py/h5t.c:15953)()
TypeError: No conversion path for dtype: dtype('<U36')
But I'm not sure if this is a result of registering the eiger handler or copy/pasting the export code and commenting out the fill_event line. Can you show me the code that resulted in this error @ordirules?
Here's a piece of the code below. Thanks for the efforts in getting this temporary fix to work. As an aside, if I want to share code, is there an easier/recommended way to do it? (for ex: pastebin etc)
import suitcase
from databroker import DataBroker as db, get_images, get_table, get_events, get_fields
from eiger_io.pims_reader import EigerImages
import datetime
from eiger_io.fs_handler import LazyEigerHandler
from filestore.api import register_handler
register_handler("AD_EIGER", LazyEigerHandler)
# The suitcase object (commented out data saving)
from collections import Mapping
import warnings
import h5py
import json
from metadatastore.commands import find_events
from databroker.databroker import fill_event
__version__ = "0.2.2"
def export(headers, filename, savedata=False):
"""
Parameters
----------
headers : a Header or a list of Headers
objects retruned by the Data Broker
filename : string
path to a new or existing HDF5 file
"""
with h5py.File(filename) as f:
for header in headers:
header = dict(header)
try:
descriptors = header.pop('descriptors')
except KeyError:
warnings.warn("Header with uid {header.uid} contains no "
"data.".format(header), UserWarning)
continue
top_group_name = header['start']['uid']
group = f.create_group(top_group_name)
_safe_attrs_assignment(group, header)
for i, descriptor in enumerate(descriptors):
# make sure it's a dictionary and trim any spurious keys
descriptor = dict(descriptor)
descriptor.pop('_name', None)
desc_group = group.create_group(descriptor['uid'])
data_keys = descriptor.pop('data_keys')
_safe_attrs_assignment(desc_group, descriptor)
events = list(find_events(descriptor=descriptor))
event_times = [e['time'] for e in events]
desc_group.create_dataset('time', data=event_times,
compression='gzip', fletcher32=True)
data_group = desc_group.create_group('data')
ts_group = desc_group.create_group('timestamps')
if savedata:
[fill_event(e) for e in events]
for key, value in data_keys.items():
value = dict(value)
timestamps = [e['timestamps'][key] for e in events]
ts_group.create_dataset(key, data=timestamps,
compression='gzip',
fletcher32=True)
data = [e['data'][key] for e in events]
dataset = data_group.create_dataset(
key, data=data, compression='gzip', fletcher32=True)
# Put contents of this data key (source, etc.)
# into an attribute on the associated data set.
_safe_attrs_assignment(dataset, dict(value))
def _clean_dict(d):
d = dict(d)
for k, v in list(d.items()):
# Store dictionaries as JSON strings.
if isinstance(v, Mapping):
d[k] = _clean_dict(d[k])
continue
try:
json.dumps(v)
except TypeError:
d[k] = str(v)
return d
def _safe_attrs_assignment(node, d):
d = _clean_dict(d)
for key, value in d.items():
# Special-case None, which fails too late to catch below.
if value is None:
value = 'None'
# Try storing natively.
try:
node.attrs[key] = value
# Fallback: Save the repr, which in many cases can be used to
# recreate the object.
except TypeError:
node.attrs[key] = json.dumps(value)
cfndb = db(user="CFN", start_time="2016-03-25", stop_time="2016-03-29")
detector = 'eiger4m_single_image'
export(cfndb, "/home/lhermitte/test.hd5")
what is the type of cfndb
?
It's a list of headers returned by databroker (databroker.databroker.Header)
Ok great. I didnt think that was an issue but I did want to check.
So hdf5 doesn't like unicode. Can you add a print(key)
at the beginning of the for key, value in data_keys.items():
loop?
(so that we can see which key is failing)
You can share code on gist.github.com too, if that's any easier. Copy/pasting into a github issue is a pretty common pattern though.
it's 'eiger4m_single_image'
Can you print the header and show me the contents. I am specifically interested in the descriptors.
print(cfndb[0].descriptors)
Here is the full header printed and raw output of descriptor further below. thanks!
======
EventDescriptor
---------------
configuration :
upper_ctrl_limit: 0.0
source : PV:XF:11IDB-ES{Det:Eig4M}cam1:DetDist_RBV
precision : 3
units : m
shape : []
dtype : number
lower_ctrl_limit: 0.0
upper_ctrl_limit: 0.0
source : PV:XF:11IDB-ES{Det:Eig4M}cam1:BeamX_RBV
precision : 3
units : pixels
shape : []
dtype : number
lower_ctrl_limit: 0.0
upper_ctrl_limit: 0.0
source : PV:XF:11IDB-ES{Det:Eig4M}cam1:Wavelength_RBV
precision : 4
units : Angstro
shape : []
dtype : number
lower_ctrl_limit: 0.0
upper_ctrl_limit: 0.0
source : PV:XF:11IDB-ES{Det:Eig4M}cam1:BeamY_RBV
precision : 3
units : pixels
shape : []
dtype : number
lower_ctrl_limit: 0.0
eiger4m_single_d: 4.84
eiger4m_single_b: 1209.0
eiger4m_single_w: 1.4251056909561157
eiger4m_single_b: 1327.0
eiger4m_single_d: 1459084275.190846
eiger4m_single_b: 1459084274.753193
eiger4m_single_w: 1459084274.753201
eiger4m_single_b: 1459084274.753195
+-----------------------------+--------+------------+----------------+-----------+-----------------+-------------------------------------------+-------+
| data keys | dtype | external | object_name | precision | shape | source | units |
+-----------------------------+--------+------------+----------------+-----------+-----------------+-------------------------------------------+-------+
| eiger4m_single_image | array | FILESTORE: | eiger4m_single | | [2070, 2167, 0] | PV:XF:11IDB-ES{Det:Eig4M} | |
| eiger4m_single_stats1_total | number | | eiger4m_single | 0 | [] | PV:XF:11IDB-ES{Det:Eig4M}Stats1:Total_RBV | |
| eiger4m_single_stats2_total | number | | eiger4m_single | 0 | [] | PV:XF:11IDB-ES{Det:Eig4M}Stats2:Total_RBV | |
| eiger4m_single_stats3_total | number | | eiger4m_single | 0 | [] | PV:XF:11IDB-ES{Det:Eig4M}Stats3:Total_RBV | |
| eiger4m_single_stats4_total | number | | eiger4m_single | 0 | [] | PV:XF:11IDB-ES{Det:Eig4M}Stats4:Total_RBV | |
| eiger4m_single_stats5_total | number | | eiger4m_single | 0 | [] | PV:XF:11IDB-ES{Det:Eig4M}Stats5:Total_RBV | |
+-----------------------------+--------+------------+----------------+-----------+-----------------+-------------------------------------------+-------+
name : primary
object_keys :
eiger4m_single : ['eiger4m_single_image', 'eiger4m_single_stats1_total', 'eiger4m_single_stats2_total', 'eiger4m_single_stats3_total', 'eiger4m_single_stats4_total', 'eiger4m_single_stats5_total']
run_start : bab70c5b-6f8a-4f2e-9394-491ef449842e
time : 1459160831.191348
uid : bada393a-c326-41b4-a746-82948ce624fd
RunStart
--------
beamline_id : CHX
config :
detectors : ['eiger4m_single']
energy_keV : 8.687
experiment : XPCS
exposure_time : 300.0
extra : Aaron Stein remade the ReferenceDot sample (10 um lines making grid on 75 um pitch, with various dot patterns within)
group : chx
holder : vacuum bar holder
measure_type : N6 Pattern 015
name : ReferenceDots03 again2 (air holder)
owner : xf11id
plan_args :
delay : 0
num : 1
detectors : [EigerSingleTrigger(prefix='XF:11IDB-ES{Det:Eig4M}', name='eiger4m_single', read_attrs=['file', 'stats1', 'stats2', 'stats3', 'stats4', 'stats5'], configuration_attrs=['beam_center_x', 'beam_center_y', 'wavelength', 'det_distance'], monitor_attrs=[])]
plan_type : Count
project :
sample :
x : 0.794786
holder : vacuum bar holder
y : -0.1821799999999998
name : ReferenceDots03 again2 (air holder)
extra : Aaron Stein remade the ReferenceDot sample (10 um lines making grid on 75 um pitch, with various dot patterns within)
scan_id : 13858
sequence_ID : 2751.0
time : 1459160529.530022
uid : bab70c5b-6f8a-4f2e-9394-491ef449842e
user : CFN
x : 0.794786
x_position : 0.794786
y : -0.1821799999999998
y_position : -0.1821799999999998
RunStop
-------
exit_status : success
reason :
run_start : bab70c5b-6f8a-4f2e-9394-491ef449842e
time : 1459160831.2157428
uid : 0abf0fa8-c558-455c-bcb3-2daa705c1e1b
And finally the output of the descriptor:
[{'uid': 'bada393a-c326-41b4-a746-82948ce624fd', 'configuration': {'eiger4m_single': {'data_keys': {'eiger4m_single_det_distance': {'upper_ctrl_limit': 0.0, 'source': 'PV:XF:11IDB-ES{Det:Eig4M}cam1:DetDist_RBV', 'precision': 3, 'units': 'm', 'shape': [], 'dtype': 'number', 'lower_ctrl_limit': 0.0}, 'eiger4m_single_beam_center_x': {'upper_ctrl_limit': 0.0, 'source': 'PV:XF:11IDB-ES{Det:Eig4M}cam1:BeamX_RBV', 'precision': 3, 'units': 'pixels', 'shape': [], 'dtype': 'number', 'lower_ctrl_limit': 0.0}, 'eiger4m_single_wavelength': {'upper_ctrl_limit': 0.0, 'source': 'PV:XF:11IDB-ES{Det:Eig4M}cam1:Wavelength_RBV', 'precision': 4, 'units': 'Angstro', 'shape': [], 'dtype': 'number', 'lower_ctrl_limit': 0.0}, 'eiger4m_single_beam_center_y': {'upper_ctrl_limit': 0.0, 'source': 'PV:XF:11IDB-ES{Det:Eig4M}cam1:BeamY_RBV', 'precision': 3, 'units': 'pixels', 'shape': [], 'dtype': 'number', 'lower_ctrl_limit': 0.0}}, 'data': {'eiger4m_single_det_distance': 4.84, 'eiger4m_single_beam_center_x': 1209.0, 'eiger4m_single_wavelength': 1.4251056909561157, 'eiger4m_single_beam_center_y': 1327.0}, 'timestamps': {'eiger4m_single_det_distance': 1459084275.190846, 'eiger4m_single_beam_center_x': 1459084274.753193, 'eiger4m_single_wavelength': 1459084274.753201, 'eiger4m_single_beam_center_y': 1459084274.753195}}}, 'data_keys': {'eiger4m_single_image': {'shape': [2070, 2167, 0], 'dtype': 'array', 'source': 'PV:XF:11IDB-ES{Det:Eig4M}', 'object_name': 'eiger4m_single', 'external': 'FILESTORE:'}, 'eiger4m_single_stats4_total': {'source': 'PV:XF:11IDB-ES{Det:Eig4M}Stats4:Total_RBV', 'precision': 0, 'object_name': 'eiger4m_single', 'shape': [], 'dtype': 'number', 'units': ''}, 'eiger4m_single_stats2_total': {'source': 'PV:XF:11IDB-ES{Det:Eig4M}Stats2:Total_RBV', 'precision': 0, 'object_name': 'eiger4m_single', 'shape': [], 'dtype': 'number', 'units': ''}, 'eiger4m_single_stats5_total': {'source': 'PV:XF:11IDB-ES{Det:Eig4M}Stats5:Total_RBV', 'precision': 0, 'object_name': 'eiger4m_single', 'shape': [], 'dtype': 'number', 'units': ''}, 'eiger4m_single_stats3_total': {'source': 'PV:XF:11IDB-ES{Det:Eig4M}Stats3:Total_RBV', 'precision': 0, 'object_name': 'eiger4m_single', 'shape': [], 'dtype': 'number', 'units': ''}, 'eiger4m_single_stats1_total': {'source': 'PV:XF:11IDB-ES{Det:Eig4M}Stats1:Total_RBV', 'precision': 0, 'object_name': 'eiger4m_single', 'shape': [], 'dtype': 'number', 'units': ''}}, 'time': 1459160831.191348, '_name': 'EventDescriptor', 'name': 'primary', 'run_start': 'bab70c5b-6f8a-4f2e-9394-491ef449842e', 'object_keys': {'eiger4m_single': ['eiger4m_single_image', 'eiger4m_single_stats1_total', 'eiger4m_single_stats2_total', 'eiger4m_single_stats3_total', 'eiger4m_single_stats4_total', 'eiger4m_single_stats5_total']}}]
and thanks for the gist reference. I agree, sounds simpler to paste here for now thanks!
Ok, so basically what I'm going to have you do is ignore any data that is external. At the top of the data_keys.items()
for loop, add this:
for key, value in data_keys.items():
if descriptor['data_keys'][key].get('external'):
continue
That will skip adding any keys which are in filestore.
Alternatively, you can safely cast it to a string if you do want the filestore reference:
data = [e['data'][key] for e in events]
if descriptor['data_keys'][key].get('external'):
data = [str(d) for d in data]
Makes sense? Let me know how this works
Or you could also do
if 'external' in descriptor['data_keys'][key]:
thanks for explaining this in detail. As you said, I think the main bug was something to do with converting the data string to unicode. Doing this fixed the issue. Ignoring the data like you said also allowed me to not write out the data so that is perfect. Here is what finally worked for me:
import unicodedata
data = [e['data'][key] for e in events]
if data_keys[key].get('external'):
data = [unicodedata.normalize('NFKD', d).encode('ascii','ignore') for d in data]
When I look at the saved metadata (with hdfview), however, I don't see all the metadata that I know I saved with the files. Here is what I see for example: (http://imgur.com/kAa2ib3)
What I would like is for all the metadata to be saved, including custom keys that have been added. Is this possible? thanks!
edit: I should clarify there were two issues with this modification:
import unicodedata
somewhere):
data = [unicodedata.normalize('NFKD', d).encode('ascii','ignore') for d in data]
descriptor['data_keys']
was already popped into data_keys
so data_keys
is what needs to be accessed not descriptor['data_keys']
The metadata that you are looking at is stored as attributes. Here is what I see when I select a header. Look at the bottom of the hdfview window.
oops my fault, yes it was hidden :-| Issue resolved, this is perfect and works great, thanks! :-)
Thanks for the patience. I'll get to work on turning what we discussed in this thread into an example in the docs for this project (as time permits!)
Great thanks!
For completeness, I also encountered another error:
ValueError: Unable to create group (Name already exists)
This is because the file exists. It might be a good idea to add some checking for file existence and possibly an overwrite flag? Anyway, I just fixed by modifying this line ( add a "w" in the h5py file opener):
with h5py.File(filename,"w") as f:
Ah ok. I'll have a think on what the best way to handle this is. Thanks for the bug report and clear explanation
ok thanks. oh, also the main idea for this is to have a temporary local database of the metadata. suitcase helps export it but it might be preferable to convert from hdf5 to something that can be searched more efficiently. Do you have any recommendations? thanks again for all the quick and efficient support!
On the readme of this repository we enumerate two things that we want suitcase to do. As you say "suitcase helps export" which maps onto (1) and "the main idea for this is to have a temporary local database of the metadata" is (2). We have implemented (1) but not yet (2). Our goal is to be able to export the headers that you care about into a local databroker-like interface. I am glad to hear that you want something like this. It is validation that we are on the right track. We will get there, but for now I do not have any recommendations regarding how to turn the output of export()
into a local database because I have not spent any time trying to solve that problem yet.
ok sounds good thanks. I'll search through the hdf5 file for now with some wrapper functions that look like database searches. That way we'll be ready for (2).
If I have time, I might also try to get a local database installed, but that's another beast (I was thinking a local mongoDB?). I mainly asked because I have a feeling a search through the hdf5 file may take a while. We'll see.
But yes, I agree, I think you're on the right track. I never thought of saving data this way before but when you get used to it, you find that it's quite convenient and in the long run, is more scalable. I hope you can all withstand the user nagging in the meantime from users like me :-P. thanks again!
ok one more question: which entry in this metadata is the filename? there are quite a few uids and the ones I thought may have been so I coulnd't find in my file structure. if it's not there, how would i save the filename into this database? thanks!
filename of what, exactly?
It's the filename of the detector files saved. Currently, Yugang gave us this code to allow us to extract them:
from filestore.path_only_handlers import RawHandler
def get_filenames(hdr, detector):
'''Get the filenames for a header for the EIGER images. If
not an EIGER data set (no EIGER), return empty list.'''
events = get_events(hdr, handler_overrides={detector: RawHandler})
fns = list()
for ev in events:
hh = ev['data']
if 'eiger4m_single_image' in hh:
fns.append(hh['eiger4m_single_image'][0])
return fns
I tried copying and pasting a subset of this into the suitcase export code and can't get it to work. Is this the right way to go about this? (if this should be another issue let me know). I can also paste my attempt and the error. It basically comes from the for ev in events line, says there's a key error in looking for descriptor.
sorry this is a little vague
Yeah I'd like to see your attempt and the error. That would help.
ok, here is the code (below the desc_group line):
desc_group = group.create_group(descriptor['uid'])
# extra code to fetch filename (if EIGER file)
events2 = list(get_events(header, handler_overrides={detector: RawHandler}))
for ev in events2:
hh = ev['data']
if 'eiger4m_single_image' in hh:
filename = hh['eiger4m_single_image'][0]
_safe_attrs_assignment(dataset, {'filename' : filename})
and here is the current error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-39-f5b54e8b1574> in <module>()
----> 1 export(cfndb, "/home/lhermitte/notebooks/notebook-B004-mar262016/database-B004.hd5")
<ipython-input-38-87a44f4fd813> in export(headers, filename, savedata)
42 desc_group = group.create_group(descriptor['uid'])
43 # extra code to fetch filename (if EIGER file)
---> 44 events2 = list(get_events(header, handler_overrides={detector: RawHandler}))
45
46 for ev in events2:
/opt/conda_envs/analysis/lib/python3.4/site-packages/databroker/databroker.py in get_events(headers, fields, fill, handler_registry, handler_overrides)
364 fields = []
365 fields = set(fields)
--> 366 _check_fields_exist(fields, headers)
367
368 for header in headers:
/opt/conda_envs/analysis/lib/python3.4/site-packages/databroker/databroker.py in _check_fields_exist(fields, headers)
634 if stop is not None:
635 all_fields.update(header['stop'])
--> 636 for descriptor in header['descriptors']:
637 all_fields.update(descriptor['data_keys'])
638 objs_conf = descriptor.get('configuration', {})
KeyError: 'descriptors'
Are you in a notebook?
Can you drop into pdb and print the contents of header
and report it here?
Or just wrap line 44 in a try/except KeyError and print the header that is erroring
I'm in a notebook. I tried moving to ipython but received some other error (let me know if you prefer that).
All headers are errroring, here is one (I've replaced some names with 'x' s ):
{'start': {'measure_type': 'xxxxxxxxxxxx', 'sample': {'name': 'xxxxxxxxxxxxxx', 'y': -0.1821799999999998, 'holder': 'xxxxxxx', 'x': 0.794786, 'extra': 'xxxxxxxxxxxxxxxxxxxxx'}, 'sequence_ID': 2751.0, 'config': {}, 'scan_id': 13858, 'project': '', 'experiment': 'xxxxxxx', 'uid': 'bab70c5b-6f8a-4f2e-9394-491ef449842e', 'x': 0.794786, 'x_position': 0.794786, 'y_position': -0.1821799999999998, 'holder': 'xxxxxxxxxxx', 'owner': 'xf11id', 'time': 1459160529.530022, 'plan_args': {'detectors': "[EigerSingleTrigger(prefix='XF:11IDB-ES{Det:Eig4M}', name='eiger4m_single', read_attrs=['file', 'stats1', 'stats2', 'stats3', 'stats4', 'stats5'], configuration_attrs=['beam_center_x', 'beam_center_y', 'wavelength', 'det_distance'], monitor_attrs=[])]", 'num': '1', 'delay': '0'}, 'energy_keV': 8.687, 'y': -0.1821799999999998, 'detectors': ['eiger4m_single'], 'beamline_id': 'CHX', 'plan_type': 'Count', 'user': 'CFN', 'name': 'xxxxxxxxx', 'group': 'chx', '_name': 'RunStart', 'exposure_time': 300.0, 'extra': 'xxxxxxxxxxxxxxxxxxxxxxx'}, '_name': 'header', 'stop': {'time': 1459160831.2157428, '_name': 'RunStop', 'uid': '0abf0fa8-c558-455c-bcb3-2daa705c1e1b', 'reason': '', 'exit_status': 'success', 'run_start': 'bab70c5b-6f8a-4f2e-9394-491ef449842e'}}
can you pprint the header?
import pprint pprint.pprint(header)
I know I'm being sort of fussy here, but it is so much easier to read, sorry :cry:
no you're not and thanks for this lib suggestion, I find it useful! (sorry for my delay, the notebook crashed from all the output, I probably should have added a return statement after the first print :-P)
here is the output:
{'_name': 'header',
'start': {'beamline_id': 'CHX',
'config': {},
'detectors': ['eiger4m_single'],
'energy_keV': 8.687,
'experiment': 'XPCS',
'exposure_time': 300.0,
'extra': 'xxxxxx',
'group': 'chx',
'holder': 'xxxxx',
'measure_type': 'xxxxx',
'name': 'xxxxxxxxx',
'owner': 'xf11id',
'plan_args': {'delay': '0',
'detectors': "[EigerSingleTrigger(prefix='XF:11IDB-ES{Det:Eig4M}', "
"name='eiger4m_single', "
"read_attrs=['file', 'stats1', "
"'stats2', 'stats3', 'stats4', "
"'stats5'], "
"configuration_attrs=['beam_center_x', "
"'beam_center_y', 'wavelength', "
"'det_distance'], monitor_attrs=[])]",
'num': '1'},
'plan_type': 'Count',
'project': '',
'sample': {'extra': 'xxxxxxx',
'holder': 'xxxxx',
'name': 'xxxxxx',
'x': 0.794786,
'y': -0.1821799999999998},
'scan_id': 13858,
'sequence_ID': 2751.0,
'time': 1459160529.530022,
'uid': 'bab70c5b-6f8a-4f2e-9394-491ef449842e',
'user': 'CFN',
'x': 0.794786,
'x_position': 0.794786,
'y': -0.1821799999999998,
'y_position': -0.1821799999999998},
'stop': {'exit_status': 'success',
'reason': '',
'run_start': 'bab70c5b-6f8a-4f2e-9394-491ef449842e',
'time': 1459160831.2157428,
'uid': '0abf0fa8-c558-455c-bcb3-2daa705c1e1b'}}
Haha yeah, one print would be good :-D
I use the following pattern:
try:
something_that_raises()
except SomeException as e:
pprint(helpful_information)
raise
What I find confusing is that there are no descriptors in that header. What the heck? Do all of the headers lack a descriptor?
Can you share the full code that caused this?
I'm not sure, it could be I've done something wrong. Here it is thanks!
(https://gist.github.com/ordirules/a0f99e8f5030b8d3f7f45b67b2dd9689)
Got the code. thanks. I'll try to run this on the CHX kernel and see if I can figure something out. I'll get back to you within about an hour
Had to go to the post office. Will start looking at this in about 15 min
thanks, i just noticed something, descriptors is popped. I just replaced that line with:
descriptors = header['descriptors']
seems to work so far. ill keep playing and let you know if i have any other issues. sorry to take your time on this with my one special case :-(
Ah, are you talking about this line? That would make sense why the code is barfing then. Good catch :-D
yep, that's what I meant. I just checked and tried and was able to read just fine. :-) For my purposes, this will work. thanks again for your time :-)
Anyway, I do have one request/comment. Often when acquiring data, a 2D image (or some other supplemental data) will be saved for us in a separate file (and almost every other group). Currently, using blue sky, it is quite difficult to actually find what that file name is supposed to be so that the file containing this data can be found and read. (none of the uids match the file names)
I think that whenever supplemental data is stored outside the json key, a string referring to its filename should be stored in that key, somewhere. Currently, I had to use a complicated workaround that @yugangzhang wrote to help us (thanks Yugang).
I know it's discouraged, but I think it makes sense. At least for something like suitcase. When packaging the user's data, I think it makes more sense to leave the large supplemental files as is in some file structure, and simply give the relative paths (+ filenames) into that structure. That way, for example, when a user wants to update their local db with more data, they have the option of just downloading the metadata and merging the files into their file structure separately etc.
I think it will most likely occur very often that an experimentalist will want to know what the filename of their supplemental data so they can maybe open it with other software or share it etc.
What do you think? You know more about the plans of the databroker, perhaps there is a better way.
Thanks again!
tl;dr, the things that you are requesting are on our todo list.
There is not a guaranteed 1:1 mapping of an image to a filename in the databroker stack. Especially when we start dumping all data into hdf5 files (for storage reasons). That is one reason why giving a filename back is not guaranteed to be helpful.
I think it makes more sense to leave the large supplemental files as is in some file structure, and simply give the relative paths (+ filenames) into that structure
I am not sure how much that would actually help you here. You would still need to physically move the data from the CHX server to your local drive so that you could access it. It sounds like what you really want is to be able to update the filestore database with a new file location.
I know it's discouraged, but I think it makes sense.
A large part of the reason why it is discouraged is because we cannot reliably support moving data yet. @tacaswell is currently working on adding a move()
command into filestore so that we can better support exporting data.
Currently, I had to use a complicated workaround
Part of the reason why it is complicated is that we do not yet support the notion of "moving" files that are already in filestore. Once we have sorted that out (@tacaswell is currently working on it)
ok great. I just wanted to give feedback to try to help somewhat with meeting users' needs but sounds like you guys have already considered all this and are working on it. :-)
About the file downloading. I can explain. Currently, what I'm doing is extracting the metadata and then rsync'ing the folders with the data on CHX we've taken onto our local servers. When I read the files, I locally set a parent directory in my routines and extract the relative paths from the filenames saved. However, when we're taking data in real time, what I will sometimes do is create a mount point with sshfs directly into CHX's file structure. Running our code from my laptop/workstation/athome or wherever is then a matter of changing the parent directory. What we're doing requires a bit of tweaking and playing around sometimes so simply using existing notebooks on the nsls2 server might not be enough.
My case might be a little extreme and maybe it's not as common. However, I think it'd be nice for databroker to support it by allowing users to select whether or not to download the large files (like detector files) with the metadata or not. If they opt out, then they're responsible for retrieving these larger files from the beamline.
There is just one issue I'm worried about from my point of view. I'm a little bit worried about the idea of dumping all data into the same hdf5 file, even if it ends up saving space. The way I see it is that as a user coming to a beamline, I expect to extract some data that I need. It could be a processed result, or raw files. With this data, there's also metadata (time, location, sample name etc). I think both these quantities should be kept separate and not be abstracted into the same grab bag of data.
Anyway, that's my point of view, but I'm honestly flexible. You've heard my comment and I trust your actions. I definitely like the overall structure of storing metadata in some general database. It helps reduce confusion. :)
thanks for the info :-)
@ordirules reported this export error to me via email: