cms-sw / genproductions

Generator fragments for MC production
https://twiki.cern.ch/twiki/bin/view/CMS/GitRepositoryForGenProduction
75 stars 774 forks source link

Filter efficiency from fragments should be dropped once and for all #3269

Open efeyazgan opened 2 years ago

efeyazgan commented 2 years ago

@menglu21 @sihyunjeon @Saptaparna

Since Si hyun and Sitian are sure (see https://cms-talk.web.cern.ch/t/filter-efficiency-check-in-gen-check-script/15409) that the filter efficiency is not used at all I removed the check in the request checking script (see: https://github.com/cms-sw/genproductions/pull/3268) so this filter should be removed from all fragments. Please do that soon and then I will adapt the script not to cause inconsistency in UL consistency check. However, if it is not removed soon, I think we should revert back the request checking script.

In any case, I do not take any responsibility for any problems that this change may cause: https://github.com/cms-sw/genproductions/pull/3268.

Thanks.

agrohsje commented 1 year ago

Hi Efe. The concern I had was that this info is written in our edm files. So we cannot know if someone is using or not. Technically, it is not relevant. Only this statement is true. That's why I like the solution of Sihyun, i.e. consider 1 or -1 as some default that is ok. Allowing again random numbers is not so great. We are just too many people to assure no one is using.

efeyazgan commented 1 year ago

Hi Alexander, yes, but following the same reasoning we could have kept the check. If something is used, we should check how it is used. I am sure that check was there for a reason...

sihyunjeon commented 1 year ago

Hi so there are several things to think about here.

This creates somewhat unnecessary entropy for the MC contacts because a. We don't know whether there are any real users for this metadata so we might be putting our efforts to something without real gain. b. Removing the filter efficiency fragment still works (avoids gen checking script errors) - which means subset of the samples would for sure have broken filter efficiency written in GEN files as metadata. So subset of our samples are already broken. c. It's not so easy to modify this through python scripts because efficiency from the run log (for filter efficiency) only gets delivered through email. I don't think the run log results are stored somewhere in McM, at least I am not aware. If I am correct, one needs to crop out the values from email boxes and tweak it into the fragments and I would hardly imagine MC contacts doing this. So in the end it would return to b. where MC contacts will just remove the line to avoid the problem, breaking the variables that should not be used.

But as Alexander said, SOME might use this and it MIGHT be not totally useless to store correct values. So allowing default settings (filter efficiency >= 1.0 or <=0.0 which doesn't make sense) is sort of compromise in between.

agrohsje commented 1 year ago

Hi Efe, maybe this was not clear in my message: I think we should have kept the check (and the motivation now is the same as back then: it is stored in EDM and we don't know if someone is using it or not) but just slightly modify it as proposed by Sihyun: But as Alexander said, SOME might use this and it MIGHT be not totally useless to store correct values. So allowing default settings (filter efficiency >= 1.0 or <=0.0 which doesn't make sense) is sort of compromise in between. Cheers, Alexander.

sihyunjeon commented 1 year ago

Just to add a bit more

(filter efficiency >= 1.0 or <=0.0 which doesn't make sense)

This means that "filter efficiency itself doesn't make sense already and the users if they exist, they would know it's not trustable so they would avoid using it. but if it's some realistic value e.g. 0.48, people might believe the value is true and mistakenly use it if wrong values are stored."

So my proposal was, avoid checking unrealistic values BUT check realistic values to make sure people don't use them.

efeyazgan commented 1 year ago

OK, done. See https://github.com/cms-sw/genproductions/pull/3280

efeyazgan commented 1 year ago

See the update: https://github.com/cms-sw/genproductions/pull/3282

tvami commented 9 months ago

I run into this now too, if the eff is set to -1 it's still failing the script. I even understand from the thread above that it was discussed that in case of the eff is negative we should not give an error. Anyway, I made a PR to add that patch https://github.com/cms-sw/genproductions/pull/3572