Open cvelten opened 1 year ago
Hey @cvelten
I agree that I haven't really thought of a good way to handle all these additional data.
I have thought about how I would do this should someone ask, and I think the answer is:
so the code would look something like:
from ParticlePhaseSpace import DataLoaders
from ParticlePhaseSpace import PhaseSpace
test_data_loc = Path('to/phase_space.phsp')
ps_data = DataLoaders.Load_TopasData(test_data_loc)
PS = PhaseSpace(ps_data)
PS.fill.event_ID(test_data_loc) # the fill method re-reads the data.
# (we could also not put this in fill but do something like):
PS.read_additional_data.event_ID(test_data_loc)
This is still a bit hacky, but one of the rules I was trying to adhere to was that a DataReader should enable a very consistent data quantities. Allowing all these quantities to maybe/maybe not get read in at the data load stage sends us in a direction I don't like where gets hard to write consistent code downstream. Also I consider most of the quantities above to be pretty niche in most situations.
What do you think? Could this solution work?
Or, I could maybe enable passing an optional class which could contain additional methods. this could be a more general solution for these situations where the basic data format is not sufficient?
from ParticlePhaseSpace import DataLoaders
from ParticlePhaseSpace import PhaseSpace
class ClassForMyData:
def read_vertice_info(self, data_loc):
pass
def operate_on_new_data(self):
pass
test_data_loc = Path('to/phase_space.phsp')
ps_data = DataLoaders.Load_TopasData(test_data_loc)
PS = PhaseSpace(ps_data, user_methods = ClassForMyData())
PS.user_methods.read_vertice_info()
Using those fill methods might work. But to add multiple columns it seems more cumbersome to re-read and fill every time you add a column? What about, in addition, adding a method with a signature like
PhaseSpace.fill(data: Path, fill_methods: List[method signature])
It could either add all the columns at once or call the methods sequentially (which would just be encapsulating multiple method calls.
Alternatively, at least for the TOPAS PhaseSpace, couldn't you just add all the columns that come out of the phase space into the allowed columns list? I could see a method on your data loader or the PhaseSpace
that lists all allowed (and available) columns .available_columns()
, which one could then pass to a PhaseSpace.fill.columns(List[str])
.
Hi @cvelten
I have drafted an example of how I think this can work. A demonstration is here.
What do you think of this? would this fulfill your needs? Note that you would still have to write a new data exporter to make sure these quantites get written to a new topas phase space file.
I think this looks good and versatile. I'll give it a try, soon.
A non-exhaustive list of parameters in the PHSP contains the following
While some of them are used in calculating momentum and similar, others, like RunID and EventID are not easily added. Copying the TOPAS data loader and adding the column does not work as it is not a required column but has to be added as well. This feels somewhat "hacky".