NSLS-II / metadatastore

DEPRECATED: Incorporated into https://github.com/NSLS-II/databroker
Other
2 stars 11 forks source link

Mds corrections #150

Closed ericdill closed 8 years ago

ericdill commented 9 years ago

When complete, this PR will introduce the ability to update metadatastore documents via a new Corrections collection (sure is a tongue twister!) and an update function in metadatastore. A Corrections document is a DynamicDocument with three fields:

  1. uid: The unique identifier for one of the other documents (RunStart, RunStop, EventDescriptor, BeamlineConfig)
  2. corrections_uid: A unique string identifier for one Corrections document. Defaults to uuid4 if none is provided
  3. time: The unix epoch time that the document was saved

Note that Event documents cannot be updated. update(document) will raise if isinstance(document, Event)

This PR introduces a new kwarg into the find_* functions, initially called use_newest_correction=True which is a boolean flag that will:

This PR introduces a new function find_corrections that will retrieve all corrections for a document based on either the original document's uid or the correction_uid of a specific correction.

This PR requires test coverage before it can be merged.

Todo

http://nbviewer.ipython.org/ff3540dbe06073a673ec

Known use cases

  1. RunStart points to the wrong beamline config (https://github.com/ericdill/metadataStore/commit/6b9792e0e2d1af2c41470a7e06b8ad6c30fb9e12)
    1. Insert a new beamline configuration with insert_beamline_configuration. This adds to the BeamlineConfig collection
    2. Update RunStart to point to new beamline configuration
    3. call update on the updated RunStart document. This adds to the Correction collection.
  2. Annotate RunStart, EventDescriptor, RunStop documents with new tags (https://github.com/ericdill/metadataStore/commit/6b9792e0e2d1af2c41470a7e06b8ad6c30fb9e12)
    1. Edit run_start
    2. call update(run_start). This adds to theCorrectioncollection. Future calls tofind_run_starts(uid=run_start.uid)will return the tagged run_start document. If you want the original document,find_run_starts(uid=run_start.uid, use_newest_correction=False)` will return it
  3. EventDescriptor has wrong shape for data key
    1. Edit event_descriptor
    2. call update(event_descriptor). This adds to the Correction collection. Future calls to find_event_descriptors(uid=event_descriptor.uid) will return the corrected event descriptor. If you want the original document, find_event_descriptors(uid=event_descriptor.uid, use_newest_correction=False) will return it
  4. BeamlineConfig contains inaccurate information
    1. Similar to the first point. Update the beamline config. Save it. Update all RunStart's which were using that beamline config.
tacaswell commented 9 years ago

can you nuke and ban the ipython notebook checkpoints?

tacaswell commented 9 years ago

and would it be possible to get that object graph as any kind of vector graphic?

tacaswell commented 9 years ago

have you tried running this on the HXN case that was slow?

ericdill commented 9 years ago

I have not tried the HXN case yet. Do you remember what that scan id was?

tacaswell commented 9 years ago

https://github.com/NSLS-II/dataportal/issues/169

scan 512

ericdill commented 9 years ago

re scan 512 on HXN: When I tried to run that last time, I discovered that the channelarchiver had filled up all 11 TB at HXN and 20-something TB at CSX. I'll try it again tomorrow

ericdill commented 9 years ago
In [6]: def time_events(header):
    t0 = ttime.time()
    events = db.fetch_events(header)
    t1 = ttime.time()
    events = list(events)
    t2 = ttime.time()
    print('time for db.fetch_events call = %s s' % (t1-t0))
    print('time to listify the generator = %s s' % (t2-t1))
    return events
   ...: 

In [7]: events = time_events(hdr512)
time for db.fetch_events call = 4.05311584473e-06 s
time to listify the generator = 42.5619840622 s

...crap

ericdill commented 9 years ago

Here's some profiling output sorted by the total time for each function. I also included the cProfile output saved to disk. Would you be able to help me sort out why this is slow @tacaswell ?

https://gist.github.com/ericdill/95aa082c489153c0f0a0

help me tacaswell

ericdill commented 8 years ago

I suppose this is going to die on the vine. oh well!