dmwm / CRABClient

14 stars 35 forks source link

preparelocal fails when `scriptArgs` is present #5276

Closed belforte closed 5 months ago

belforte commented 7 months ago


belforte commented 6 months ago

the problem is that after preparelocal the script sh 1 ends up executing inside

python3 -r /afs/ -a sandbox.tar.gz --sourceURL= --jobNumber=1 --cmsswVersion=CMSSW_13_3_0 --scramArch=el8_amd64_gcc12 --inputFile=job_input_file_list_1.txt --runAndLumis=job_lumis_1.json --lheInputFiles=False --firstEvent=None --firstLumi=None --lastEvent=None --firstRun=None --seeding=AutomaticSeeding --eventsPerLumi=None --maxRuntime=-1 '--scriptArgs=[\"\"exitCode=666\"\",' '\"\"gotArgs=Yes\"\"]' -o '{}'

in particular this is bad

--scriptArgs=[\"\"exitCode=666\"\",' '\"\"gotArgs=Yes\"\"]

as it leads to

ERROR: Exceptional exit at Thu Jan 25 14:20:38 2024 UTC 10040: Expecting value: line 1 column 2 (char 1) ERROR: Traceback follows: Traceback (most recent call last): File "", line 743, in

i.e. tries to read the above --scriptArgs=[\"\"exitCode=666\"\",' '\"\"gotArgs=Yes\"\"] as JSON

belforte commented 6 months ago

the original line in was

config.JobType.scriptArgs = ['exitCode=666', 'gotArgs=Yes']

and works finely in the real job where data are passed around via the REST DB and in there tm_scriptargs is the string ['exitCode=666', 'gotArgs=Yes'] image

I think that the problem is that preparelocal writes this file

belforte@lxplus805/local> cat InputArgs.txt -a sandbox.tar.gz --sourceURL= --jobNumber=1 --cmsswVersion=CMSSW_13_3_0 --scramArch=el8_amd64_gcc12 --inputFile=job_input_file_list_1.txt --runAndLumis=job_lumis_1.json --lheInputFiles=False --firstEvent=None --firstLumi=None --lastEvent=None --firstRun=None --seeding=AutomaticSeeding --eventsPerLumi=None --maxRuntime=-1 --scriptArgs=[\"\"exitCode=666\"\", \"\"gotArgs=Yes\"\"] -o {} belforte@lxplus805/local>

While we need to pass those around in JSON format

In the job log we have

Arguments are -a sandbox.tar.gz --sourceURL= --jobNumber=1 --cmsswVersion=CMSSW_13_3_0 --scramArch=el8_amd64_gcc12 --inputFile=job_input_file_list_1.txt --runAndLumis=job_lumis_1.json --lheInputFiles=False --firstEvent=None --firstLumi=None --lastEvent=None --firstRun=None --seeding=AutomaticSeeding --eventsPerLumi=None --maxRuntime=-60 --scriptArgs=["exitCode=666", "gotArgs=Yes"] -o {}


--scriptArgs=["exitCode=666", "gotArgs=Yes"]
belforte commented 6 months ago

hmmm... the "bad" format is already in the input_args.json file which is fetched (via InputFiles.tar.gz) from the scheduler's WEB_DIR, and is placed there by DagmanCreator which contains

    def prepareLocal(self, dagSpecs, info, kw, inputFiles, subdags):
        """ Prepare a file named "input_args.json" with all the input parameters of each jobs. It is a list
            with a dictionary for each job. The dictionary key/value pairs are the arguments of
            N.B.: in the JDL: "Executable =" and "Arguments =  $(CRAB_Archive) --sourceURL=$(CRAB_ISB) ..."
            where each argument of each job is set in "input_args.json".
            Also, this prepareLocal method prepare a single "InputFiles.tar.gz" file with all the inputs files moved
            from the TW to the schedd.
            This is used by the client preparelocal command.

        argdicts = []
        for dagspec in dagSpecs:
            argDict = {}
            argDict['inputFiles'] = 'job_input_file_list_%s.txt' % dagspec['count'] #'job_input_file_list_1.txt'
            argDict['runAndLumiMask'] = 'job_lumis_%s.json' % dagspec['count']
            argDict['CRAB_Id'] = dagspec['count'] #'1'
            argDict['lheInputFiles'] = dagspec['lheInputFiles'] #False
            argDict['firstEvent'] = dagspec['firstEvent'] #'None'
            argDict['lastEvent'] = dagspec['lastEvent'] #'None'
            argDict['firstLumi'] = dagspec['firstLumi'] #'None'
            argDict['firstRun'] = dagspec['firstRun'] #'None'
            argDict['CRAB_Archive'] = info['cachefilename_flatten'] #'sandbox.tar.gz'
            argDict['CRAB_ISB'] = info['cacheurl_flatten'] #u''
            argDict['CRAB_JobSW'] = info['jobsw_flatten'] #u'CMSSW_9_2_5'
            argDict['CRAB_JobArch'] = info['jobarch_flatten'] #u'slc6_amd64_gcc530'
            argDict['seeding'] = 'AutomaticSeeding'
            argDict['scriptExe'] = kw['task']['tm_scriptexe'] #
            argDict['eventsPerLumi'] = kw['task']['tm_events_per_lumi'] #
            argDict['maxRuntime'] = kw['task']['max_runtime'] #-1
            argDict['scriptArgs'] = json.dumps(kw['task']['tm_scriptargs']).replace('"', r'\"\"') #'[]'
            argDict['CRAB_AdditionalOutputFiles'] = info['addoutputfiles_flatten']
            #The following two are for fixing up job.submit files
            argDict['CRAB_localOutputFiles'] = dagspec['localOutputFiles']
            argDict['CRAB_Destination'] = dagspec['destination']

        with open('input_args.json', 'w', encoding='utf-8') as fd:
            json.dump(argdicts, fd)
belforte commented 6 months ago

I have to wonder if this line is correct !

            argDict['scriptArgs'] = json.dumps(kw['task']['tm_scriptargs']).replace('"', r'\"\"') #'[]'

looking in detail the same is used in preparing the DagMan spec, but it is quite possible that HTCondor args parsing has different rules from bash/python

In RunJobs.dag in the SPOOL_DIR I find

VARS Job1 count="1" runAndLumiMask="job_lumis_1.json" lheInputFiles="False" firstEvent="None" firstLumi="None" lastEvent="None" firstRun="None" maxRuntime="-60" eventsPerLumi="None" seeding="AutomaticSeeding" inputFiles="job_input_file_list_1.txt" scriptExe="" scriptArgs="[\"\"exitCode=666\"\", \"\"gotArgs=Yes\"\"]" +CRAB_localOutputFiles="\"output.root=output_1.root\"" +CRAB_DataBlock="\"/GenericTTbar/HC-CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/AODSIM#3517e1b6-76e3-11e7-a0c8-02163e00d7b3\"" +CRAB_Destination="\"davs://, davs://\"" ABORT-DAG-ON Job1 3

Note: scriptArgs="[""exitCode=666"", ""gotArgs=Yes""]"

belforte commented 6 months ago

I tried to chage in InputArgs.txt from (see

 --scriptArgs=[""exitCode=666"", ""gotArgs=Yes""]


 --scriptArgs=["exitCode=666", "gotArgs=Yes"]

but that is somehow mishandled in passing from CLI to bash to python since receives

 '--scriptArgs=["exitCode=666",' '"gotArgs=Yes"]'

and likely because of the embedded ' ' fails in json.loads

but removing the blank appears to do it !

bottom line seems that error needs to be fixed in DagmanCreator

belforte commented 6 months ago

is maybe better to replace InputArgs.txt with a JSON file ?

also there are two conversion to JSON now, which is fishy

belforte commented 6 months ago

I tried to avoid the double conversion and then other "hacks", but found no goo way to pass something like

--scriptArgs="['exitCode=666', 'gotArgs=Yes']" 

to and then to w/o bash introducing escapes (\) which eventually confuse things.

I now think that best way is to avoid the InputArgs.txt file, put the JSON file (currently input_args.json) prepared by DagmanCreator in the local dir and have pass it as argument. Which means introduce a new argument "--argFile" for and when that is present parse the JSON inside python.