Closed EinarElen closed 6 months ago
While I like the idea of this feature in theory, I have no idea how to implement it in a way that would avoid the possibility of infinite loops. I'm especially thinking of the case where a new filter is being developed which is currently buggy and so always aborts the event. In this case, if the user's config is using this feature, it would just run forever until killed by the user. Even more of an issue in my mind is not a pure infinite loop, but an extremely computationally inefficient configuration where the process just keeps trying events only getting 1 out of thousands, causing the processing time to explode while still technically being functional.
Perhaps the answer to this issue is just to document it and expect anyone developing filters to be familiar enough with the software to avoid infinite loops, but I could also see users landing in this infinite loop (or near infinite loop) with a poorly configured filter (like the thresholds are too high).
With the words of worry out of the way, I'm now thinking of how to implement this. Let's call this config desiredEvents
for now (better name later).
desiredEvents
should be mutually exclusive with maxEvents
(to avoid confusion), maxTriesPerEvent
(to avoid confusion) and inputFiles
(since this should only be used in production?)desiredEvents
is set, we don't do maxTriesPerEvent
checks and instead run the inner trying loop until an event is successful. This implementation is not very invasive to the current Process::run implementation. We'd just update the eventLimit_
in the while loop to either be maxEvents
or desiredEvents
(depending on configuration) and check during the event loop if we should even apply the max tries threshold.
would become
int event_limit = eventLimit_;
if (desiredEvents > 0) event_limit = desiredEvents;
while (n_events_processed < event_limit) {
and
would become
if (completed or (desiredEvents < 0 and numTries % maxTries_ == 0)) {
We leave the tries counter going so that it can be kept within the run calculations at the end of processing.
Edit: Updated links after issue transfer.
Yeah it isn't a feature without issues but I think your suggestions here are a good way to do it (roughly the same as what I hacked together to solve my issue but more well thought through 😄).
I think when it comes to infinite loop filters, the user already has an issue when that happens. Instead of a simulation running for an unexpectedly long time they'd get a resulting file with next to no events in it. In some ways, the long running simulation is a better indicator for what the issue is. Of course, this would be a problem if someone yeets off a cluster job with an unreasonably high walltime request without having validated their config but... yeah...
I think we should go ahead with implementing this because I do think it is useful enough and, as you point out, this does not introduce a new issue, just a different symptom of some issue that would already have other symptoms in other running modes.
My last thought is just to add a ldmx_log(warn)
message at the end of Process::run
if the number of actual events divided by the number of tries is below some critical value. I'm thinking 1/10000? But that's just a guesstimate.
// starting at line 240 of Process.cxx on trunk
if (n_events_processed < totalTries/10000) { // integer division is okay
ldmx_log(warn) << "Less than 1 out of every 10k events tried was accepted!"
ldmx_log(warn) << "This could be an issue with your filtering and biasing procedure since this is incredibly inefficient."
}
This would be very useful for what I'm doing in https://github.com/LDMX-Software/ldmx-sw/pull/1289
Is there a work-in-progress branch for this? Or will any of you @EinarElen @tomeichlersmith PR this, or should I do it?
There is no branch starting to implement this. I created a branch in Framework but never committed anything, so feel free to dig in!
desiredEvents (better name later)
what about totalEvents
?
Yea, I think that's a little better.
I think I'm just running up to the general issue that there are a ton of different ways to count events and we don't want to write a super long variable name that is actually specific (e.g. totalEventsToProduceWithoutTryLimit
) so I think we should just pick one (totalEvents
is good) and I'll come back around to this issue on the PR. Maybe we can add more documentation on these configuration options to the config class in this PR as well.
We could do maxEventsNoTry
(almost sounds like Master Yoda). Anyway for now I implemented it with totalEvents
This comes from https://github.com/LDMX-Software/LDCS/pull/21
When you don't know roughly how many events you need to try to achieve a certain number of final events, it would be useful if you could (optionally) specify how many events you want in total, e.g. I want to produce 10000 kaon events regardless of how many tries that would take