Closed dneise closed 7 years ago
But still, this ipynb might be interesting https://github.com/fact-project/read_mars/blob/4d79a3667ba51e34a7ba2f978ee35191cf9e555d/read_mars/tests/resources/Untitled.ipynb
I am not quite happy with the keys to the contents
member.
have a look at this list of keys. These keys are tuples of strings of this form:
('class_name', 'object_name', 'maybe_pad_name', 'canvas_name')
In my opinion the most important designators by which I think about the objects are:
object_name
(which unfortunately can sometimes be the empty)class_name
.So I'd like to say something like: I want the TH1F with the name "Gain" .. in case there are multiple things of name Gain, I'd like to specify the kind of thing. in this case "TH1F". However if there is only one thing called "Gain", which should I bother to remember it's type?
Anyway .. let's have a look at the complete list:
[('TH1F', 'Sum1', 'SumHist'),
('TF1', 'spektrum', 'SumHist'),
('TH1F', 'SumC1', 'SumHist'),
('TH1D', 'Pix0', 'Pix0'),
('TF1', 'spektrum', 'Pix0'),
('TH1D', 'Pix5', 'Pix5'),
('TF1', 'spektrum', 'Pix5'),
('MHCamera', 'Rate', 'Cams1_1', 'Cams1'),
('MHCamera', 'Gain', 'Cams1_2', 'Cams1'),
('MHCamera', 'Baseline', 'Cams1_3', 'Cams1'),
('MHCamera', 'RelSigma', 'Cams1_4', 'Cams1'),
('MHCamera', 'Crosstalk', 'Cams1_5', 'Cams1'),
('MHCamera', 'Noise', 'Cams1_6', 'Cams1'),
('MHCamera', 'FitProb', 'Cams2_1', 'Cams2'),
('MHCamera', 'Chi2', 'Cams2_2', 'Cams2'),
('MHCamera', 'CoeffR', 'Cams2_4', 'Cams2'),
('MHCamera', 'Pxtalk', 'Cams2_5', 'Cams2'),
('TH1F', 'Rate1', 'Hists1_1', 'Hists1'),
('TH1F', 'Rate2', 'Hists1_1', 'Hists1'),
('TH1F', 'Gain1', 'Hists1_2', 'Hists1'),
('TH1F', 'Gain2', 'Hists1_2', 'Hists1'),
('TH1F', 'Baseline1', 'Hists1_3', 'Hists1'),
('TH1F', 'Baseline2', 'Hists1_3', 'Hists1'),
('TH1F', 'RelSigma1', 'Hists1_4', 'Hists1'),
('TH1F', 'RelSigma2', 'Hists1_4', 'Hists1'),
('TH1F', 'Crosstalk1', 'Hists1_5', 'Hists1'),
('TH1F', 'Crosstalk2', 'Hists1_5', 'Hists1'),
('TH1F', 'Noise1', 'Hists1_6', 'Hists1'),
('TH1F', 'Noise2', 'Hists1_6', 'Hists1'),
('TH1F', 'FitProb1', 'Hists2_1', 'Hists2'),
('TH1F', 'FitProb2', 'Hists2_1', 'Hists2'),
('TH1F', 'ChiSq1', 'Hists2_2', 'Hists2'),
('TH1F', 'ChiSq2', 'Hists2_2', 'Hists2'),
('TH1F', 'CoeffR1', 'Hists2_4', 'Hists2'),
('TH1F', 'CoeffR2', 'Hists2_4', 'Hists2'),
('TH1F', 'Pxtalk1', 'Hists2_5', 'Hists2'),
('TH1F', 'Pxtalk2', 'Hists2_5', 'Hists2'),
('MHCamera', 'NormGain', 'NormGain_1', 'NormGain'),
('TH1F', 'NormGain1', 'NormGain_2', 'NormGain'),
('TH1F', 'NormGain2', 'NormGain_2', 'NormGain'),
('TH1F', 'SumC1', 'CleanHist1'),
('TF1', 'spektrum', 'CleanHist1'),
('TH1F', 'SumC2', 'CleanHist2'),
('TF1', 'spektrum', 'CleanHist2'),
('TH1F', 'SumScale1', 'GainHist1'),
('TF1', 'spektrum', 'GainHist1'),
('TH1F', 'SumScale2', 'GainHist2'),
('TF1', 'spektrum', 'GainHist2'),
('TH1F', 'Time', 'ArrTime'),
('TH1F', 'Puls', 'Pulse')]
And for another kind of file:
[('TH2F', 'Baseline', 'MHBaseline'),
('MHCamEvent', 'MHCamEvent', 'Baseline'),
('MHCamera', '', 'Baseline_1', 'Baseline'),
('TH1D', 'proj', 'Baseline_2', 'Baseline'),
('MHCamera', '', 'Baseline_3', 'Baseline'),
('MHCamera', 'err', 'Baseline_4', 'Baseline'),
('TProfile', 'rad', 'Baseline_5', 'Baseline'),
('TProfile', 'az', 'Baseline_6', 'Baseline'),
('TH2F', 'Signal', 'MHSingles_1', 'MHSingles'),
('TH2F', 'Time', 'MHSingles_2', 'MHSingles'),
('TProfile2D', 'Pulse', 'MHSingles_3', 'MHSingles'),
('TH1D', 'Time_py', 'Time'),
('TH1D', 'Pulse_py', 'Pulse')]
Now .. don't get me wrong. I find it great, that I can look at this list and get a complete overview about the contents of this file, without the need for an X-connection to where the file is.
But it's cumbersome to type these tuples. Maybe some kind of select
statement, like:
file.select(type='TH1', name='signal')
which returns lists of contents that comply to my selection and if I'm lucky, all I need to say is: f.select(name='Gain')
but ...
Is that a useful interface?
my 50cents: So far we were trying to extract certain plots from a MarsStatusDisplay because we knew they were there and we knew which information we could gain from them. With this, lets say, new approach, we are able to extract whatever there maybe in a root file with statusDisplay, which i really like. However, by doing so we might loose a bit of context, since we do not visually see which plots belong together and what they could mean.
I think the information from which tab a certain plot came delivers some information. Usually you could assume that plots in the same tab share some sort of context. Thus, I think keeping the tab information might sometimes (not always) help.
The TPad information, in my opinion, seems pretty useless, because the only thing i learn e.g. from ('MHCamera', 'Baseline', 'Cams1_3', 'Cams1')
is, Baseline is the third plot in the Cams1 canvas.
But when looking at e.g.
('TH1F', 'Rate1', 'Hists1_1', 'Hists1'),
('TH1F', 'Rate2', 'Hists1_1', 'Hists1'),
i learn there are two TH1F plots with almost the same name in the same tab. Damn it, they are not useless! However, I had to open Mars to figure out that Rate2 is Model fit to Rate1, not sure how to tackle this.
Bottom line:
Keeping the canvas name makes sense to me. Pad name can possibly be reduced to pad number, because the canvas name part of the pad name is redundant.
The select statement (with possible keys type, name, canvas, tab_nr
) approach seems to me a good way. However it does not solve the two-plots-with-almost-the-same-name issue. But in this case you would see that they are from the same tab and could simply plot them to get insight.
Now it looks like this:
[StatusDisplayKey(name='Baseline', class_name='TH2F', canvas_name='MHBaseline', pad_number=None),
StatusDisplayKey(name='MHCamEvent', class_name='MHCamEvent', canvas_name='Baseline', pad_number=None),
StatusDisplayKey(name='', class_name='MHCamera', canvas_name='Baseline', pad_number=1),
StatusDisplayKey(name='proj', class_name='TH1D', canvas_name='Baseline', pad_number=2),
StatusDisplayKey(name='', class_name='MHCamera', canvas_name='Baseline', pad_number=3),
StatusDisplayKey(name='err', class_name='MHCamera', canvas_name='Baseline', pad_number=4),
StatusDisplayKey(name='rad', class_name='TProfile', canvas_name='Baseline', pad_number=5),
StatusDisplayKey(name='az', class_name='TProfile', canvas_name='Baseline', pad_number=6),
StatusDisplayKey(name='Signal', class_name='TH2F', canvas_name='MHSingles', pad_number=1),
StatusDisplayKey(name='Time', class_name='TH2F', canvas_name='MHSingles', pad_number=2),
StatusDisplayKey(name='Pulse', class_name='TProfile2D', canvas_name='MHSingles', pad_number=3),
StatusDisplayKey(name='Time_py', class_name='TH1D', canvas_name='Time', pad_number=None),
StatusDisplayKey(name='Pulse_py', class_name='TH1D', canvas_name='Pulse', pad_number=None),
]
Okay so here is another interface, I was playing around with.
I would call it the "DataFrame" interfacce, since I am internally (mis-) using a pandas.DataFrame to generate this.
Have a look at the pad_number
... its converted to float, and missing pad_numbers turn up as NaN....
not sure we want that.
I think I am going too fast for myself here... this is somehow faking a Dataframe but not really ... and one does not know what to expect from this ... so I guess it's shit.
Let's say this is maybe an outlook. but at the moment .. I recommend to use the get()
method where one needs to explicitely provide all the 4 parameters to get a certain object out of the file. In order to learn what the parameters are one can print the keys()
and simply copy+paste the parameters from there into the get()
call ...
that is still fairly good.
add StatusDisplay class:
This can open a root file, which contains a single MStatusDisplay and give access to the contents of that StatusDisplay.