WIP: Read callisto MSignalCam

dneise commented 7 years ago

Work in progress

this function reads charge and arrival time from a _C.root file coming out of callisto. It is extremely slow ~900 events in 140 sec.

I don't know how to speed the stuff up. But maybe you wanna have a look.

dneise commented 7 years ago

Related to issue #8

mblnk commented 7 years ago

Thanks for implementing the read function so fast.

The function worked on several callisto files, but it really is painfully slow so I just applied the V1-trick again.

Now it takes only 123 ms instead of 106 s to read 499 Events.

dneise commented 7 years ago

That is awesome! I tried to play around with the Draw command but could not get it to work.

How did you manage to guess(?) these Draw strings?

dneise commented 7 years ago

I would vote to merge, but these things still might be added.

some tests and an example callisto file
bump the version up
think about the return value? This is a dict of arrays, another read_mars.read function returns a dataframe. I think both are fine, since the user can convert the return value further into whatever she finds useful.

But if we can streamline an interface, we might wanna do it now.

the dtype is hardcoded as float64, can we improve that? Can we read the dtype (as a string or so) from the tree?

dneise commented 7 years ago

is there usually more in a callisto file, one needs? More than charge and arrival time?

mblnk commented 7 years ago

Concerning the "Draw-strings": I just had a look into a callisto file with the root TBrowser() and found those leafs. As it is possible to show the histogram of both leafs in the TBrowser, it follows, that the Draw function can handle them. As the number of entries is exactly 1440 times the number of events, I guessed, that V1 contains an ordered vector with all pixels of all events. Reshaping the returned array into slices of length 1440 and comparing the values to the ones of the implementation without V1 confirmed the guess. I also had a short look into "MSignalPix.cc" of MARS - but that was not too helpful.

screenshot_tbrowser

mblnk commented 7 years ago

About other things in a callisto file:

There is another single pixel information: a " time variable fTimeSlope describing the width of the rise time". To get that, one would need to add it to the dictionary of the fields with "MSignalCam.fPixels.fTimeSlope".

As one can also see in the screenshot attached to the previous post, there are also some "typical" branches like MPointingPos and MTime. One might need those, but they can already be read with the read_mars() function.

It might be useful to add a check to read_mars, that automatically skips leafs containing MSignalCam and prints a notification. The notification might say something like: "This root file contains branches with MSignalCam. To read the pixel/image information, try read_callisto()."

mblnk commented 7 years ago

Concerning the data type:

I don't know where it happens, that the data type always ends up as float64, but I think that GetV1 is always double.

Aside from that, the datatype of a leaf can apparently be checked:

file = ROOT.TFile(filename)
tree = file.Get('Events')
leaf = tree.GetLeaf("MSignalCam.fPixels.fArrivalTime")
leaf.GetTypeName()

Do we need/want to do something with that information?

dneise commented 7 years ago

We could convert the float64 back to the type of the leaf.

maxnoe commented 7 years ago

This V1 stuff casts all data to float64, as the Histogram's Fill functions takes a double.

mblnk commented 7 years ago

I just added the casting to the original datatypes. Can the changed datatypes of the pandas dataframe returned by read_mars lead to problems? If no problems are expected, I would vote to merge.

dneise commented 7 years ago

No need to duplicate the default values in the function doc string and in the function definition. In fact, this duplication can quickly become stale, when somebody changes the default but forgets to update the doc-string.

2017-11-13-181116_1920x1080_scrot

dneise commented 7 years ago

ordering pixels along CHID, is the only ordering which makes sense in our python based eco system. We expect this ordering to be the case and want to only talk about the ordering, in case it is not sane.

So we remove the remark about the CHID ordering.

dneise commented 7 years ago

What happens, if somebody makes a mistake ... say they want to specify more fields, but make a typo:

fields={
  'charge': 'MSignalCam.fPixels.fPhot',
  'arrival_time': 'MFoo.fBar.fBaz'
}

does the function crash in a nice way, so they can try again? or do we get a segFault?

dneise commented 7 years ago

At some point N was renamed to n_events ... but one instance was forgotten...

There are two unit tests, but they did not find this bug. So there are clearly tests missing

dneise commented 7 years ago

Oh, I just see the tests did not find the error because they seg fault ... at least on my machine ... that is not so good...

dneise commented 7 years ago

The "V1" trick was already implemented in the function names leaves_to_numpy.

basically read_callisto is just a simple call to leaves_to_numpy, when redifining the fields to be a list and not a dict of {name: getter, name2: getter2}, but just list [getter, getter2]

see commit right above

dneise commented 7 years ago

when looking at read_mars and read_callisto, they are actually just providing some leaf_names.

the former by iterating over all leaf_names and filtering the invalids away, the latter simply has a default list of leaf_names that are difficult to remember ...

there must be a better way!

mblnk commented 7 years ago

Both functions do now throw errors.

read_callisto throws

`---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-a7fba0ecd8c5> in <module>()
----> 1 read_callisto("/media/michi/523E69793E69574F/gamma/callisto/gamma_gustav_12/10917/00010917.917_D_MonteCarlo010_Events.fits.gz_callisto.root")

~/anaconda3/lib/python3.6/site-packages/read_mars-0.0.5-py3.6.egg/read_mars/__init__.py in read_callisto(filename, tree, fields)
    126             'MSignalCam.fPixels.fTimeSlope'
    127     """
--> 128     return leaves_to_numpy(ROOT.TFile(filename).Get(tree), fields, N=1440)

~/anaconda3/lib/python3.6/site-packages/read_mars-0.0.5-py3.6.egg/read_mars/__init__.py in leaves_to_numpy(tree, leaf_names, N)
     59     # See eg. https://root.cern.ch/root/roottalk/roottalk03/0638.html
     60 
---> 61     n_events = tree.GetEntries()
     62     tree.SetEstimate(n_events + 1)  # necessary for files with more than 1 M events
     63     out = {}

AttributeError: 'PyROOT_NoneType' object has no attribute 'GetEntries'

and read_mars throws:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-5-587957623746> in <module>()
----> 1 read_mars("/media/michi/523E69793E69574F/gamma/callisto/gamma_gustav_12/10917/00010917.917_D_MonteCarlo010_Events.fits.gz_callisto.root")

~/anaconda3/lib/python3.6/site-packages/read_mars-0.0.5-py3.6.egg/read_mars/__init__.py in read_mars(filename, tree, leaf_names)
    106         ]
    107 
--> 108     df = pd.DataFrame(leaves_to_numpy(tree, leaf_names))
    109     file.Close()
    110 

~/anaconda3/lib/python3.6/site-packages/read_mars-0.0.5-py3.6.egg/read_mars/__init__.py in leaves_to_numpy(tree, leaf_names, N)
     69         tree.Draw(leaf_name, "", "goff")
     70         v1 = tree.GetV1()
---> 71         v1.SetSize(n_events * tree_v1_dtype.itemize)  # a double has 8 Bytes
     72         out[leaf_name] = np.frombuffer(v1.tobytes(), dtype=tree_v1_dtype)
     73 

AttributeError: 'numpy.dtype' object has no attribute 'itemize'
`

do they work for you?

mblnk commented 7 years ago

for read_callisto: the multiplication times 1440 in the SetEstimate and SetSize functions is important:

tree.SetEstimate(n_events 1440) and v1.SetSize(n_events 8 * 1440)

otherwise, some values are omitted or the function just fails

mblnk commented 7 years ago

Now it works again on my machine. There was a typo, itemize should be itemsize.

When supplying a typo in the fields list of read_callisto, one gets no segfault but a rather cryptic error: ValueError: PyMemoryView_FromBuffer(): info->buf must not be NULL

dneise commented 7 years ago

Nope it was also not working on my machine ... I developed without testing .. and you saw the result. then I had to go ... and you took over ...

I was still developing .. but you didn't know that ...

In the end you spent a lot of time reparing something, which I do not like at all anymore. well ... maybe you like it ... I don't know.

Anyway, please have a look at #11

There is a single test, showing the interface ...

fact-project / read_mars

WIP: Read callisto MSignalCam #9