Simon thoughts on xpdAcquireFuncs 2.0

@chiahaoliu, @pavoljuhas I would like to discuss with the group a more or less complete refactoring of the xpdAcquireFuncs code based on what we learned from the first go-around. Here are some things that I think we need to address:

can we work with the areaDetector developer team with enough lead-time to make changes to areaDetector so that the data collection at least is as we want it. This would save a lot of headaches.
I would like to make the metadata saving less error prone, and more automatic on recovery from a crash.
1. make some kind of hierarchical metadata structure (possibly a class structure) so that pieces of metadata are saved as class attributes (or it could be a hierarchical dictionary but it is easier for me to understand as a class structure) that are inherited by things lower down in the class structure. I will describe what I mean in more detail below.
2. save the state of the class instances as YAML files on the hard drive. if icollection is run as a crash recovery, then restore the state from the YAML file. If it is a new session then start from scratch.
3. maybe it is actually time to write a very thin gui so that metadata can be entered in there, but importantly the state of the metadata can be seen very quickly in a permanent view window. Whatever is in that window will be saved with the data that is being collected. This will help the experiments take control of the metadata situation.
4. keep RE.gs.md empty, but dump the whole metadata stack into the run when it is executed, so write a run script that dumps everything from the hierarchical structure into RE.gs.md at the run-engine level when the scan is executed, then reset the global state to empty aftwards (but keep relevant metadata in the different class objects.
We need to have a thorough discussion about what metadata we want to save based on our experiences from before, so that this new structure is not too dynamic. Some metadata should be obtained automatically, like time-stamps and so on.
OK, there may be more things to discuss, but this is it for now.

Below I will paste some code that captures what I mean about a hierarchical structure to the metadata that resembles what I have in mind. I think the hierarchy might look something like Beamtime

Experiment
- Sample
- Scan
  - Exposure
  - Frame

A beamtime is made of a series of experiments. An experiment may be one but may be a series of samples. Each sample may have multiple scans for temperature and so on, each scan is made of multiple exposures (this is the level where run engine is called) and each exposure may consist of multiple frames. Here I used frame to be a single capture event of the detector and an exposure could be a single or multiple frames in general. If we go to continuous operation of the detector then there will be no explicit frame in the hierarchy. We can think of what are the attributes of each object. Attributes of experiments are proposal #, SAF #, Experimenters, institutions, dates of the beamtime and so on (we can think of more). Attributes of sample are name, composition, shape, color, whatever, and so on. the idea is that we minimize the amount of typing the users do to make this as easy and robust as possible. I don't want to look up the SAF number every scan, or reenter the sample info. We can make scripts that allow much of this info to be entered ahead of time, and saved in YAML files which can be read at the experiment, so we can select a previously instantiated "sample" when we want it at the experiment. In the long run this could be connected to SAF database etc., but let's leave that till later.

Here is some sample code that may capture what I am on about:

'''python
class Beamtime():
   '''
   '''
   def __init__(self):
       '''
       '''
       self.name = 'test beamtime'
       self.safN = 1
       self.proposalN = 1

class Experiment(Beamtime):
    '''
    '''
    def __init__(self,beamtimeInstance):
        self.name = 'test experiment'
        self.mybt = beamtimeInstance
        self.wavelength = 0.19
        pass

class Sample(Experiment):
    '''
    '''
    def __init__(self,exptInst,comp):
        self.name = 'test sample'
        self.myexp = exptInst
        self.composition = comp

if __name__ == '__main__':
     mybt = Beamtime()
#     print(mybt.name)
     myexp = Experiment(mybt)
#     print(myexp.mybt.proposalN)
     mysam = Sample(myexp,'Al2O3')
     print('mysam')
     print(mysam.name)
     print(mysam.myexp.mybt.name)
     print(mysam.myexp.wavelength)
     print(mysam.composition)
     mysam2 = Sample(myexp,'Ni')
     print('mysam2')
     print(mysam2.name)
     print(mysam2.myexp.mybt.name)
     print(mysam2.myexp.wavelength)
     print(mysam2.composition)
     mysam3 = Sample(myexp,'spaghetti')
     print('mysam3')
     print(mysam3.composition)
'''

I will make a new branch to implement new logics we want. I can first clean code, so that xpdAcquirefuncs at least can take in metadata logic mentioned above; keep global state metadata clean and dump necessary metadata every time we execute a scan.

I worked a little bit on metadata search before and I finished following functionalities:

recursively find keys start with assigned characters (fuzzy search)
find a list of key map to target key in a nested dictionary. For example: my_dict = {'a':{'b':{'c': {'d': 'target'}}} and when feed in 'd', my function will return ['a', 'b', 'c'](but this only works properly with dictionary with no duplicate keys)
Ability to set field in a nested dictionary. For example: my_dict = {'a':{'b':{'c': {'d': 'target'}}} and when feed in 'd' = 'changed_target', my function will return my_dict = {'a':{'b':{'c': {'d': 'changed_target'}}} I don't know if any of these functionalities could be helpful this time.

I have been trying to come up with a robust method of creating nested dictionaries but I always failed. Maybe @pavoljuhas can give valuable comments on this aspect. Thanks!

great. Tim, before you get too far we should have an in-depth discussion about a good design. I am not convinced the highly nested dictionaries are the best way to go, though they may be. Pavol often has really good insights that he could share and I would definitely like to hear from him. Also your inputs are important, because the design goals are: 1) robust saving of metatdata 2) flexible recovery of scans by searching on metadata and you are teh only person who has spent much time playing with the data-search part. I am very anxious to see a demo of what you have done and learned so far on that.

Pavol, will you be at BNL on Monday? Tim, is there a possibility that you could come on Monday? This could be very useful if we have a bit of a hackathon on this and nail down some design issues that you can owrk on in the coming weeks. Sorry for the short notice.

On Sat, Dec 19, 2015 at 11:00 PM, Timothy Liu notifications@github.com wrote:

I will make a new branch to implement new logics we want. I can first clean code, so that xpdAcquirefuncs at least can take in metadata logic mentioned above; keep global state metadata clean and dump necessary metadata every time we execute a scan.

I worked a little bit on metadata search before and I finished following functionalities:

recursively find keys start with assigned characters (fuzzy search)

find a list of key map to target key in a nested dictionary. For example: my_dict = {'a':{'b':{'c': {'d': 'target'}}} and when feed in 'd', my function will return 'a', 'b', 'c' http://but%20this%20only%20works%20properly%20with%20dictionary%20with%20no%20duplicate%20keys

Ability to set field in a nested dictionary. For example: my_dict = {'a':{'b':{'c': {'d': 'target'}}} and when feed in 'd' = 'changed_target', my function will return my_dict = {'a':{'b':{'c': {'d': 'changed_target'}}} I don't know if any of these functionalities could be helpful this time.

I have been trying to come up with a robust method of creating nested dictionaries but I always failed. Maybe @pavoljuhas https://github.com/pavoljuhas can give valuable comments on this aspect. Thanks!

— Reply to this email directly or view it on GitHub https://github.com/chiahaoliu/xpdAcquireFuncs/issues/28#issuecomment-166060753 .

Prof. Simon Billinge Applied Physics & Applied Mathematics Columbia University 500 West 120th Street Room 200 Mudd, MC 4701 New York, NY 10027 Tel: (212)-854-2918 (o) 851-7428 (lab)

Condensed Matter Physics and Materials Science Dept. Brookhaven National Laboratory P.O. Box 5000 Upton, NY 11973-5000 (631)-344-5661

email: sb2896 at columbia dot edu home: http:// http://nirt.pa.msu.edu/bgsite.apam.columbia.edu/

I am only doing minor changes to xpdAcquireFunc; removing try/except blocks on metadata when executing a scan. I completely agree a good design is way more important.

From my talks with software guys, I have a feeling that they save entire scan information as a dictionary in the filestore so it might be more convenient if we make our metadata the same object type then no extra effort needs to done on either databroker or our layer. Class method allows user to tab and see the attribute, which is very helpful as well. I will discuss with Pavol in more details.

I can be in BNL tomorrow but I think Pavol is on vacation now. I am very happy to google chat or skype with him at his convenience.

@sbillinge

I finished two draft versions of xpd metadata class. Here is the demonstration of version 2 ( using inheritance). In this version, user can directly instantiate top level class and then modify and view all methods from its parent classes. The main advantage of this version is user can easily view and manage data fields at top level. This feature could also be potentially overwhelming if there are many attributes within entire metadata class, but we seems to slightly far from this situation yet. Detailed demonstration is encapsulated in following picture.

@sbillinge

Here is the demonstration of another version, which is using composition method. In this version, metadata is strictly passed down between layers and user needs to explicitly follow hierarchical structure to get attributes. The advantage to this version is a clean attribute list, only attributes directly to current class appear at the first level. But strict hierarchical structure could be a headache to user.

chiahaoliu / xpdAcquireFuncs

Simon thoughts on xpdAcquireFuncs 2.0 #28