Allow adding of parent ID, event ID, and flags from TOPAS PHSP

cvelten commented 1 year ago

A non-exhaustive list of parameters in the PHSP contains the following

              'Position X [cm]': (dtype('float32'), 0),
              'Position Y [cm]': (dtype('float32'), 4),
              'Position Z [cm]': (dtype('float32'), 8),
              'Direction Cosine X': (dtype('float32'), 12),
              'Direction Cosine Y': (dtype('float32'), 16),
              'Energy [MeV]': (dtype('float32'), 20),
              'Weight': (dtype('float32'), 24),
              'Particle Type (in PDG Format)': (dtype('int32'), 28),
              'Flag to tell if Third Direction Cosine is Negative (1 means true)': (dtype('bool'),
               32),
              'Flag to tell if this is the First Scored Particle from this History (1 means true)': (dtype('bool'),
               33),
              'Time of Flight [ns]': (dtype('float32'), 34),
              'Run ID': (dtype('int32'), 38),
              'Event ID': (dtype('int32'), 42),
              'Track ID': (dtype('int32'), 46),
              'Parent ID': (dtype('int32'), 50),
              'Initial Kinetic Energy [MeV]': (dtype('float32'), 54),
              'Vertex Position X [cm]': (dtype('float32'), 58),
              'Vertex Position Y [cm]': (dtype('float32'), 62),
              'Vertex Position Z [cm]': (dtype('float32'), 66),
              'Initial Direction Cosine X': (dtype('float32'), 70),
              'Initial Direction Cosine Y': (dtype('float32'), 74),
              'Initial Direction Cosine Z': (dtype('float32'), 78)

While some of them are used in calculating momentum and similar, others, like RunID and EventID are not easily added. Copying the TOPAS data loader and adding the column does not work as it is not a required column but has to be added as well. This feels somewhat "hacky".

bwheelz36 commented 1 year ago

Hey @cvelten

I agree that I haven't really thought of a good way to handle all these additional data.

I have thought about how I would do this should someone ask, and I think the answer is:

create additional quantities as allowed columns
add methods to add quantities under fill, where the topas phase space is re-passed

so the code would look something like:

from ParticlePhaseSpace import DataLoaders
from ParticlePhaseSpace import PhaseSpace

test_data_loc = Path('to/phase_space.phsp')
ps_data = DataLoaders.Load_TopasData(test_data_loc)
PS = PhaseSpace(ps_data)
PS.fill.event_ID(test_data_loc)  # the fill method re-reads the data.
# (we could also not put this in fill but do something like):
PS.read_additional_data.event_ID(test_data_loc)

This is still a bit hacky, but one of the rules I was trying to adhere to was that a DataReader should enable a very consistent data quantities. Allowing all these quantities to maybe/maybe not get read in at the data load stage sends us in a direction I don't like where gets hard to write consistent code downstream. Also I consider most of the quantities above to be pretty niche in most situations.

What do you think? Could this solution work?

bwheelz36 commented 1 year ago

Or, I could maybe enable passing an optional class which could contain additional methods. this could be a more general solution for these situations where the basic data format is not sufficient?

from ParticlePhaseSpace import DataLoaders
from ParticlePhaseSpace import PhaseSpace

class ClassForMyData:
    def read_vertice_info(self, data_loc):
        pass

    def operate_on_new_data(self):
        pass

test_data_loc = Path('to/phase_space.phsp')
ps_data = DataLoaders.Load_TopasData(test_data_loc)
PS = PhaseSpace(ps_data, user_methods = ClassForMyData())
PS.user_methods.read_vertice_info()

cvelten commented 1 year ago

Using those fill methods might work. But to add multiple columns it seems more cumbersome to re-read and fill every time you add a column? What about, in addition, adding a method with a signature like

PhaseSpace.fill(data: Path, fill_methods: List[method signature])

It could either add all the columns at once or call the methods sequentially (which would just be encapsulating multiple method calls.

Alternatively, at least for the TOPAS PhaseSpace, couldn't you just add all the columns that come out of the phase space into the allowed columns list? I could see a method on your data loader or the PhaseSpace that lists all allowed (and available) columns .available_columns(), which one could then pass to a PhaseSpace.fill.columns(List[str]).

bwheelz36 commented 1 year ago

[x] include additional columns in allowed data
[x] make sure tests pass
[x] add a user defined method
[x] draft read topas example

bwheelz36 commented 1 year ago

Hi @cvelten

I have drafted an example of how I think this can work. A demonstration is here.

What do you think of this? would this fulfill your needs? Note that you would still have to write a new data exporter to make sure these quantites get written to a new topas phase space file.

cvelten commented 1 year ago

I think this looks good and versatile. I'll give it a try, soon.

bwheelz36 / ParticlePhaseSpace

Allow adding of parent ID, event ID, and flags from TOPAS PHSP #144