DrSAR / SARlabpy

git clone git@pfeifer.phas.ubc.ca:SARlabpy (do not push to github, please)
http://code.SARlab.ca
Other
1 stars 0 forks source link

Reinvent masterlist treatment #330

Open DrSAR opened 9 years ago

DrSAR commented 9 years ago

Our current masterlist for the description of experiments is woefully inadequate. A new attempt is to merge masterlist and the former config file used for the generation of a masterlist into one nicely human readable form. This form is again the config file format. Through the use of configobj we can make use of nested sections which allows the definition of attributes at the Experiment, Patient, Study, and Scan level.

One example config file is this:

[General]
    narrative = HT29 tumours implanted with fiducials transported to UBC
    objective = try and DCE-image HPG in our leakiest model available to finally determine whether the poor success of visualizing HPG in other models is model-specific or HPG-batch-specific.
    method = Same procuedure used as for HPGP3; 48cm of tubing = 125ul of HPG + flush at either end = 190ul of injected volume

random='''
    Here here 

    Ignore whatever happens in this code block

    '''

[HPGP4Ts01]
    comment = FAILED INJECTION; mouse connected at 10:15am; ; no change in Resp rate or temperature, confirmed no successful injection; failure at junction of tube to catheter
    injection_time = 10:50

[HPGP4Ts03]

    mass = 27g;
    injection_time = 14:15
    general_info = 1
    histo = frozen along fiducial.

    [[study vw1]]
        mass = 27g
        agent = HPG
        comment = confirmed to be good.
        treatment = 135ul Cetux injected ip at 15:15; DiOC7 injected at 16:10;
        number_of_ears = 3
        injection_time = 17:35

        [[[scanlabels]]]
            DCE+24h = 3 
            T1 = 4
            Tripilot = 1
            02_RARE_T2w_coronal = 2

        [[[DCE+24h]]]
            injection_time = 12:30
        [[[02_RARE_T2w_coronal]]]
            injection_time = 12:35
        [[[T1]]]
            mouse_temp = 36.5

Which is stored in the ~/sdata/masterlists

Now, Scans know about the masterlist related to them:

In [2]: scn = sarpy.Scan('HPGP4Ts03.vw1/3')

In [3]: scn.masterlist_attr('injection_time')
Out[3]: '12:30'

In [4]: scn.masterlist_attr('injection_time', level='Study')
Out[4]: '17:35'

In [5]: scn.masterlist_attr('injection_time', level='Patient')
Out[5]: '14:15'

In [6]: print scn.masterlist_attr('injection_time', level='Experiment')
None

A number of related issues remain unaddressed by the current working example in 4f9dd80 (and before it eb4e692):

I note that the amount of code for this masterlist is a lot smaller than the previous masterlist implementation. There are however features that are not addressed in the current system. It would be helpful if @firasm could have a look at what the impact might be. The code to look at is in branch

DrSAR commented 9 years ago

One clear further feature required is the creation of template files for writing both the general section as well as the Study-specific information. The latter might be best done with a macro on the scanner.

firasm commented 9 years ago

latest version here: 056b33f5c51f6c91aa0671ab6ce038197ec7e9e6

to deal with assertions and fringe cases.

DrSAR commented 9 years ago

Point 1. above is taken care of through the assertion that an Experiment (Study) has one and only one masterlist. Interrogation now can happen either at the Study or Experiment level and the user may trust that there is only one masterlist that covers all the scans (Point 2. above) The assertions of only one masterlist might turn out to be too strict in which case we need to develop more logic to merge masterlists from scans to Studies and Experiments but we can do that when the need arises. Points 3 and 4 remain. In particular, we envisage routines that create pandas dataframes from the masterlist for easy interrogation.

DrSAR commented 9 years ago

This might be a good time to introduce pandas.Panel. Panels are essentially the third dimension (Dataframes representing 2D data, a collection of Series, and Series being a 1D collection of 'scalars' in a loose sense of the word dimension). I suspect one might be able to trick Studies for Patients into this dimensional thinking. What might help could be the concept of a study label (akin to a scan label) so that it could be used to store them. At a minimum this could be Study 1, Study 2 etc. in chronological order.

DrSAR commented 9 years ago

A little script translated all the old masterlists (json files) into new config masterlists. They are now stored alongside the handwritten HPGP4.config. The only issue that occurred was that for three files (NecP1, HPGS4 and HerS11 had duplicates - there were straight copies of the files with presumably different bbox values) and hence I delete both. If need be, we need to create the ones of interest from one of the two options.

Other than that, I expect that every Scan that had been mentioned in one of our previous masterlists will now have those attributes in the Scan class. There might be issues with Experiments which we will have to test.

firasm commented 9 years ago

thanks!! This will prove very useful I hop

Going through the master lists now and there only appears to be formatting inconsistencies. Can you share the script you used to make this so I can adjust it?

For example, see here the spacing differences of the template (HPGP4) vs. the new ones.

screen shot 2015-05-19 at 12 23 25 pm

will have a look at NecP1, HPGS4, and HerS11

DrSAR commented 9 years ago

Here is the code to turn the old json masterlist into a dictionary. I use an OrderedDict so that we can control when 'General' and 'scanlabels' sections appear in the config file.

import os
import json
import collections
def json_masterlist_to_dict_masterlist(jsonfname):
    with open(jsonfname,'r') as f:
        old_masterlist = json.load(f)
        newdict = collections.OrderedDict({'General':{'orig_fname':jsonfname}})
        for patname,v in old_masterlist.iteritems():
            # patname is a patient name, I think
            for scnlbl, scn_desc in v.iteritems():
                if newdict.get(patname) is None:
                    newdict[patname] = {}
                if scn_desc[0]:
                    session_id = scn_desc[0].split('/')[0].split('.')[1]
                    scn_nr = scn_desc[0].split('/')[1]
                    bbox = scn_desc[1]
                    if newdict[patname].get('session '+session_id) is None:
                        newdict[patname]['session '+session_id] = {}                        
                    if newdict[patname]['session '+session_id].get('scanlabels') is None:
                        newdict[patname]['session '+session_id]['scanlabels'] = {}
                    newdict[patname]['session '+session_id]['scanlabels'][scnlbl] = scn_nr
                    if newdict[patname]['session '+session_id].get(scnlbl) is None:
                        newdict[patname]['session '+session_id][scnlbl] = {}
                    newdict[patname]['session '+session_id][scnlbl]['bbox'] = bbox
    return newdict

Using this function is a matter of

rootdir = '/home/stefan/sdata/masterlists/old_masterlists/'
newroot = '/home/stefan/sdata/masterlists/'
for f in os.listdir(rootdir):
    newdict = json_masterlist_to_dict_masterlist(os.path.join(rootdir, f))
    newconf = configobj.ConfigObj(indent_type=' ')
    for k,v in newdict.iteritems():
        newconf[k] = v
    newf = f.split('.')[0]+'.config'
    newconf.write(open(os.path.join(newroot, newf), 'w'))

Note how the creation of the ConfigObj instance determines the indentation (indent_type).

DrSAR commented 9 years ago

OK: temporary fix.

There are some outstanding issues: The above suggestions appear not to work for remote nfs mounting. dunno why, investigations continue...