dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

protect voms-proxy-info in Credential/Proxy against JVM warning message #12075

Closed belforte closed 2 months ago

belforte commented 3 months ago

Impact of the bug sometimes CRAB TaskWorker fails to submit a user taskl

Describe the bug CRAB TW have multiple processes running at same time. the JVM used in voms-proxy client can at time print a warning line before the actual information, still returning exit code 0. Proxy.py tries to eval the reply as an int, and raises an exception. More details in https://github.com/dmwm/CRABServer/issues/8625

How to reproduce it not easily reproducible. A full protection may be out of reach. But curing the known use case will surely help.

Expected behavior fewer "obscure" submission failure

Additional context and error message voms-proxy-info --timeleft returns

[0.002s][warning][perf,memops] Cannot use file /tmp/hsperfdata_crab3/760874 because it is locked by another process (errno = 11)
447504

I will provide a PR to fix this

belforte commented 3 months ago

NOTE: there are other uses of voms-proxy-info in Crential/Proxy besides -timeleft but this is the only one used in TaskWorker concurrent processes. I.e. the other commands should never report that JVM message. So I will only change code for this use case.

NOTE: this did not happen until we moved to IAM and debian container, were we found that we needed voms-proxy-info v. 3

belforte commented 2 months ago

alas this is not enough also other lines in Proxy.py needs to be protected like https://github.com/dmwm/WMCore/blob/d618b120c6189cb8d9c89e28f202b440ab6f2bae/src/python/WMCore/Credential/Proxy.py#L757

I will make another PR sorry about this

belforte commented 2 months ago

The solution in https://github.com/dmwm/WMCore/pull/12076 was wrong. It made the code ignore the exit code of the voms-proxy-info command, creating othere errors. Which led to the bogus conclusion of mine in https://github.com/dmwm/WMCore/issues/12083 . I will make new PR with proper solution and added diagnostics, since voms/myproxy errors are a pain.