DIRACGrid / DIRAC

DIRAC Grid
http://diracgrid.org
GNU General Public License v3.0
113 stars 176 forks source link

Warning about Optimizer information not properly decoded during JobWrapper execution #7481

Open michmx opened 8 months ago

michmx commented 8 months ago

We are using DIRAC 8.0.37 + our Belle extension. At the pilot logs, we are observing the following error message:

[263926]JobWrapper VERBOSE: Job input data requirement is some_path/file.root
[263926]JobWrapper INFO: Site has the following local SEs: SIGNET-DATA-SE, SIGNET-TMP-SE
[263926]JobWrapper VERBOSE: Job input data resolution policy module is DIRAC.WorkloadManagementSystem.Client.InputDataResolution
[263926]JobWrapper WARN: This should never happen
[263926]JobWrapper WARN: Optimizer information could not be converted to a dictionary will call catalog directly
...

The warning comes from an exception at DIRAC/Core/Utilities/DEncode.py#L525 :

def decode(data):
    """Generic decoding function"""
    if not data:
        return data
    if not isinstance(data, bytes):
        raise NotImplementedError("This should never happen")
    return g_dDecodeFunctions[data[0]](data, 0)

Then This should never happen... happens

The issue seems to be that decode() gets a str from self.optArgs["InputData"], instead of bytes during the JobWrapper execution:

try:
    optDict, _length = DEncode.decode(self.optArgs["InputData"])
    optReplicas = optDict["Value"]
    self.log.info("Found optimizer catalog result")
except Exception as x:
    self.log.warn(str(x))
    self.log.warn("Optimizer information could not be converted to a dictionary will call catalog directly")

We have confirmed that the information is properly stored inside InputData in the table OptimizerParameters at the JobDB, and seems to be encoded:

mysql> select jobID, Name, Value from OptimizerParameters where jobID = 263926;
| 263926 | InputData      | ds5:Valueds2:OKb1s5:Valueds6:Faileddes10:Successfulds142:/belle/MC/release-06-00-08/DB00002100/MC15ri_b/prod00024790/s00/e1003/4S/r00000/ccbar/mdst/sub00/mdst_000999_prod00024790_task10020001003.rootds8:Checksums8:73c26afas12:ChecksumTypes2:ADs12:CreationDatezati2022ei3ei5ei23ei9ei56ei0enes4:GUIDs36:bd4fe732-075f-563a-e2ef-d6002d0cbfa1s4:Modei436es16:ModificationDatezati2023ei3ei8ei15ei32ei35ei0enes13:NumberOfLinksi1es4:Sizei2088918091es6:Statuss1:-s16:KEK-DISK-DATA-SEs207:davs://belle2-webdav-analysis-data.cc.kek.jp:8443/disk/belle/DATA/belle/MC/release-06-00-08/DB00002100/MC15ri_b/prod00024790/s00/e1003/4S/r00000/ccbar/mdst/sub00/mdst_000999_prod00024790_task10020001003.roots14:SIGNET-DATA-SEs190:davs://dcache.ijs.si:2880/pnfs/ijs.si/belle/DATA/belle/MC/release-06-00-08/DB00002100/MC15ri_b/prod00024790/s00/e1003/4S/r00000/ccbar/mdst/sub00/mdst_000999_prod00024790_task10020001003.rooteeees14:SiteCandidatesds13:LCG.SIGNET.sids4:diski1es4:tapei0ees10:LCG.KEK.jpds4:diski1es4:tapei0ees13:ARC.SIGNET.sids4:diski1es4:tapei0ees17:LCG.KIT-TARDIS.deds4:diski1es4:tapei0ees15:EuroHPC.Vega.sids4:diski1es4:tapei0eeee |

And we can reproduce the return of a string using the JobDB:

>>> from DIRAC.WorkloadManagementSystem.DB.JobDB import JobDB
>>> jobDB = JobDB()

>>> a = jobDB.getJobOptParameters(263926)
>>> type(a['Value']['InputData'])
<class 'str'>

But according to what the decode() function expects at DEncode.py, this should be bytes.

Have you experienced a similar issue recently? We found that you had the same warnings at #2421, which resulted in introducing the decode() at the JobWrapper.


Additional information on MySQLdb version:

>>> MySQLdb.version_info
(2, 0, 3, 'final', 0)
fstagni commented 8 months ago

Thank you for the info. This actually happens also in LHCb production instance (also v8.0.x):

2024-02-23 08:48:08 UTC None/[849602392]JobWrapper INFO: Site has the following local SEs: Glasgow-DST, Glasgow_MC-DST
2024-02-23 08:48:08 UTC None/[849602392]JobWrapper WARN: This should never happen
2024-02-23 08:48:08 UTC None/[849602392]JobWrapper WARN: Optimizer information could not be converted to a dictionary will call catalog directly

The good news is that it is not dangerous.

I suspect an encode is needed at some point, we had in more recent past a bunch of cases like this, some fixed e.g. in https://github.com/DIRACGrid/DIRAC/pull/5854

Maybe simply this line in JobWrapper:

optDict, _length = DEncode.decode(self.optArgs["InputData"])

should become

optDict, _length = DEncode.decode(self.optArgs["InputData"].encode())
michmx commented 8 months ago

@fstagni Thanks a lot for the check and the confirmation that also happens on LHCb.

We have not been able to test the addition of .encode() yet as we got distracted with another problem, but after testing we will report the outcome.