ANNIEsoft / ToolAnalysis

Other
10 stars 53 forks source link

Filter Events #239

Closed mnieslony closed 1 year ago

mnieslony commented 1 year ago

The FilterEvents tool & toolchain enables to filter events according to some event criteria and then saving them into a different output file. With this, one can for example select all events within a run that are neutrino candidates, and then save them into a separate output file. This speeds up subsequent toolchains which use the same cuts. In addition, analyzers don't always have to read in 1000+ part files per run if they already filtered the file once and saved for example a single filtered file per run.

Ideally, official filters should be stored in a global directory, which was implemented here as the configfiles/Filters directory. Some very basic filters have been added there.

The FilterEvents tool itself is a very heavy copy of the FilterLAPPDEvents tool by Rory, but does not have any connection to LAPPDs and was hence moved into a separate tool in order to avoid having too many modes for a single tool again. If desired, it can also be merged with the FilterLAPPDEvents tool.

In addition, small bugfix in the LoadANNIEEvent tool, the global event number was not set correctly.

More information in ANNIE docDB entry 5182.

marc1uk commented 1 year ago

Hi Michael, PR looks pretty good to me. One minor oddity seems to be that you've added the CutConfiguration variable to the EventSelector tool, but that Tool doesn't seem to do anything with it other than pass it via the CStore to the FilterEvents Tool. Why is this not just a configuration variable in the FilterEvents Tool?

The other thing is I'm not a huge fan of the pattern of:

Tool 1:
    BoostStore* annie_ev = new BoostStore(true, 2);
    m_data->Stores.emplace("ANNIEEvent", annie_ev);
    annie_ev->Set("a", 1);
    annie_ev->Set("bunch", 1);
    annie_ev->Set("of", 1);
    annie_ev->Set("things", 1);
Tool 2
    BoostStore* annie_ev = m_data->Stores.at("ANNIEEvent");
    var this, that, the, other;
    annie_ev->Get("a", this);
    annie_ev->Get("bunch", that);
    annie_ev->Get("of", the);
    annie_ev->Get("things", other);

    BoostStore* copy_ev = new BoostStore(true, 2);
    copy_ev->Set("a", this);
    copy_ev->Set("bunch", that);
    copy_ev->Set("of", the);
    copy_ev->Set("things", other);
    copy_ev->Save(outfile);
    copy_ev->Delete();

It seems like a lot of bloated code and inefficient copying. I think we may need @brichards64 to elaborate on why simply saving the desired ANNIEEvent instances to the selected events output file isn't possible, and whether it can be made possible. If that's really not possible, perhaps we can at least get an improved method for copying BoostStore contents; e.g. implementations of a copy constructor or operator=.

For the time being I think it might be possible to use an intermediate json:

std::string data;
*m_data->Stores.at("ANNIEEvent") >> data;
FilteredEvents->JsonParser(data);
FilteredEvents->Save(outFile);
FilteredEvents->Delete();

which at least is more streamlined and accommodates new contents to the upstream BoostStore without requiring developers to keep multiple independent Tools in sync manually.... but i'm not sure if this works with pointers, or nested BoostStores. If that doesn't work, how about a method which loops over m_variables and does suitable memcpy or streaming or whatever, and similarly loops over m_pointers and makes new instances on the heap.... If it's possible to achieve, i think it would be better implemented once, as a suitable member function of the BoostStore class, than all these user workarounds.

mnieslony commented 1 year ago

Hi Marcus,

One minor oddity seems to be that you've added the CutConfiguration variable to the EventSelector tool, but that Tool doesn't seem to do anything with it other than pass it via the CStore to the FilterEvents Tool. Why is this not just a configuration variable in the FilterEvents Tool?

So the idea here is that people can run the FilterEvents toolchain with different filters by just selecting the corresponding filter configuration file in the configfiles/Filters directory. These Filter config files are globally defined and would hence also already contain the name of the filter which should then be also saved in the output BoostStore. In order to save the filter name, the FilterEvents tool then just grabs it from the CStore and saves it to the output file. Since the filtername should uniquely identify the specific filter configuration, it makes more sense to define this variable with the filter globally in the configfiles/Filters directory. Then one can also just specify the filter in the ToolsConfig file and does not need to worry that the filter matches the CutConfiguration variable in the FilterEvents tool, since the CutConfiguration name is already defined with the filter itself.

Regarding your other comment:

It seems like a lot of bloated code and inefficient copying.

I agree that there should be a better way to do this, but as far as I'm aware it's not easily possible to just grab whole Events and save them to a different store? I might be mistaken though... Unfortunately I don't really have time to look into the other options you mention. Maybe it can be discussed at a software meeting? (The same issue also applies to the FilterLAPPDEvents tool of course, and maybe other tools that I don't know about)

marc1uk commented 1 year ago

hi michael, thanks for the explanation. I'm not entirely sure i followed all of it, but it seems like the goal here is to have some global filter configuration specified in one place that affects multiple tools, so the associated variables might not be kept with their associated tool. As long as it's all documented that's fine. No other issues so i'll go ahead and merge. :+1: