NSLS-II-SIX / sixtools

Software for performing resonant inelastic xray scattering analysis at NSLS-II
https://pypi.org/project/sixtools
Other
1 stars 2 forks source link

Saving data #19

Open mpmdean opened 5 years ago

mpmdean commented 5 years ago

We should return to the question of how we save data.

If the preference is that we use another database for derived data, it would be useful to set this up so we can start thinking about how everything can fit together.

danielballan commented 5 years ago

Broadly, the DAMA vision is derived from what the climate science community already does and has been doing for years: register "levels" of data starting from raw. There will always be exploratory "scratch" data, and users should manage that however the like, but for any derived data that is reasonably routine --- starting with basic "corrections" and common reductions --- the derived data should be re-captured in a database along with the metadata that describes how it was created. For example, a Header from the "raw" databroker (the one SIX has now) might flow into a pipeline that creates "corrected" images inserts them, with a new Header, into a separate databroker instance for corrected data. This new Header would include a pointer to the raw Header and all metadata necessary for recreating how we got from raw to corrected ("Python function X from version Y of library Z was applied with parameters {...}").

This has been tested at one or two beamlines but not widely deployed. Perhaps we could start by creating an "corrected data" databroker at SIX.

mpmdean commented 5 years ago

Sounds like a very nice and powerful implementation. Which beamlines are using this model at the moment?

Is there someone who can help/advise getting this off the ground?

danielballan commented 5 years ago

I believe it has been tried at CHX and maybe also at LIX. But Julien, who suddenly departed to new pastures, was running point on this, and we're still catching up on the details of what has been done.. In any case, I can take point on this to start.

As may have been clear from my example above, the best of kind of derived data to capture is derived data that is obtained via a well-defined, semi-automated process. Maybe humans are tweaking some parameters here and there, but the overall process should be well-defined and reasonably stable. Is there a good candidate to start with?

mpmdean commented 5 years ago

Roughly speaking there will be two steps

  1. each image will go into sixtools.rixswrapper.image_to_spectrum, which will convert it to a spectrum (i.e. two-column pixel/energy versus intensity). This is a reasonably well-defined process, although we need to be able to re-do this if needed.

  2. After that individual spectra will need to be combined. This will often require more manual intervention e.g. plotting and choosing which spectra picking spectra from different scan_ids etc.

danielballan commented 5 years ago

Let's start with (1). Can you provide a simple-as-possible script that looks up a header and processes the data? Then we will add code that packs the results back into our "document model" and inserts it into a second databroker.

mpmdean commented 5 years ago

This is the simplest meaningful operation possible https://nbviewer.jupyter.org/gist/BNL-XRG/1476e8e3f3a0ee44be24c8ab8533aa79

Here is one that is a bit more representative. At SIX we make two spectra per frame (which exist in different ROIs in the frame). https://nbviewer.jupyter.org/gist/BNL-XRG/6462fec65e3492c0b2b4643a1175c5b8

mpmdean commented 5 years ago

Hi @danielballan Was what I provided what you needed?

danielballan commented 5 years ago

I think so, yes. I will dig in next week; this week I am occupied by the "hackathon" across the street.

danielballan commented 5 years ago

OK, I think this should be our plan:

  1. I wrap this code in from-and-to document model code.
  2. During the upcoming downtime, DAMA deploys a second databroker instance for processed data at SIX.

Will try to get to (1) next week. This is my only normal workday between last week and this one. Lots of conferences.

mpmdean commented 5 years ago

Sounds good.

danielballan commented 5 years ago

Update: (2) is done. Still haven't gotten to (1).

mpmdean commented 5 years ago

Thanks

stuwilkins commented 5 years ago

@danielballan @awalter-bnl I am looking into this myself now .... Did we manage to get an analysis databroker up and running?