Add an ability to ask for N total events to process in addition to the maximum number of events

EinarElen commented 10 months ago

This comes from https://github.com/LDMX-Software/LDCS/pull/21

When you don't know roughly how many events you need to try to achieve a certain number of final events, it would be useful if you could (optionally) specify how many events you want in total, e.g. I want to produce 10000 kaon events regardless of how many tries that would take

tomeichlersmith commented 10 months ago

While I like the idea of this feature in theory, I have no idea how to implement it in a way that would avoid the possibility of infinite loops. I'm especially thinking of the case where a new filter is being developed which is currently buggy and so always aborts the event. In this case, if the user's config is using this feature, it would just run forever until killed by the user. Even more of an issue in my mind is not a pure infinite loop, but an extremely computationally inefficient configuration where the process just keeps trying events only getting 1 out of thousands, causing the processing time to explode while still technically being functional.

Perhaps the answer to this issue is just to document it and expect anyone developing filters to be familiar enough with the software to avoid infinite loops, but I could also see users landing in this infinite loop (or near infinite loop) with a poorly configured filter (like the thresholds are too high).

With the words of worry out of the way, I'm now thinking of how to implement this. Let's call this config desiredEvents for now (better name later).

desiredEvents should be mutually exclusive with maxEvents (to avoid confusion), maxTriesPerEvent (to avoid confusion) and inputFiles (since this should only be used in production?)
If desiredEvents is set, we don't do maxTriesPerEvent checks and instead run the inner trying loop until an event is successful.

This implementation is not very invasive to the current Process::run implementation. We'd just update the eventLimit_ in the while loop to either be maxEvents or desiredEvents (depending on configuration) and check during the event loop if we should even apply the max tries threshold.

https://github.com/LDMX-Software/ldmx-sw/blob/11d6a8e89250b357885602075c88b423dc04cdeb/Framework/src/Framework/Process.cxx#L205

would become

int event_limit = eventLimit_;
if (desiredEvents > 0) event_limit = desiredEvents;
while (n_events_processed < event_limit) {

and

https://github.com/LDMX-Software/ldmx-sw/blob/11d6a8e89250b357885602075c88b423dc04cdeb/Framework/src/Framework/Process.cxx#L226

would become

if (completed or (desiredEvents < 0 and numTries % maxTries_ == 0)) {

We leave the tries counter going so that it can be kept within the run calculations at the end of processing.

Edit: Updated links after issue transfer.

EinarElen commented 10 months ago

Yeah it isn't a feature without issues but I think your suggestions here are a good way to do it (roughly the same as what I hacked together to solve my issue but more well thought through 😄).

I think when it comes to infinite loop filters, the user already has an issue when that happens. Instead of a simulation running for an unexpectedly long time they'd get a resulting file with next to no events in it. In some ways, the long running simulation is a better indicator for what the issue is. Of course, this would be a problem if someone yeets off a cluster job with an unreasonably high walltime request without having validated their config but... yeah...

tomeichlersmith commented 10 months ago

I think we should go ahead with implementing this because I do think it is useful enough and, as you point out, this does not introduce a new issue, just a different symptom of some issue that would already have other symptoms in other running modes.

My last thought is just to add a ldmx_log(warn) message at the end of Process::run if the number of actual events divided by the number of tries is below some critical value. I'm thinking 1/10000? But that's just a guesstimate.

// starting at line 240 of Process.cxx on trunk
if (n_events_processed < totalTries/10000) { // integer division is okay
  ldmx_log(warn) << "Less than 1 out of every 10k events tried was accepted!"
  ldmx_log(warn) << "This could be an issue with your filtering and biasing procedure since this is incredibly inefficient."
}

tvami commented 6 months ago

This would be very useful for what I'm doing in https://github.com/LDMX-Software/ldmx-sw/pull/1289

Is there a work-in-progress branch for this? Or will any of you @EinarElen @tomeichlersmith PR this, or should I do it?

tomeichlersmith commented 6 months ago

There is no branch starting to implement this. I created a branch in Framework but never committed anything, so feel free to dig in!

tvami commented 6 months ago

desiredEvents (better name later)

what about totalEvents?

tomeichlersmith commented 6 months ago

Yea, I think that's a little better.

I think I'm just running up to the general issue that there are a ton of different ways to count events and we don't want to write a super long variable name that is actually specific (e.g. totalEventsToProduceWithoutTryLimit) so I think we should just pick one (totalEvents is good) and I'll come back around to this issue on the PR. Maybe we can add more documentation on these configuration options to the config class in this PR as well.

tvami commented 6 months ago

We could do maxEventsNoTry (almost sounds like Master Yoda). Anyway for now I implemented it with totalEvents

LDMX-Software / ldmx-sw

Add an ability to ask for N total events to process in addition to the maximum number of events #1299