Closed belforte closed 6 years ago
discussion with SI in this thread https://hypernews.cern.ch/HyperNews/CMS/get/comp-ops/3741.html
SI says that DESIRED_OpSysMajorVers is obsolete and we should be using REQUIRED_OS
but I find this code which is apparently setting it
https://github.com/dmwm/CRABServer/blob/master/src/python/TaskWorker/Actions/DagmanCreator.py#L504-L509
while the classAd list from the job wrapper does not show it [1]. Digging... [1] from https://cmsweb.cern.ch/scheddmon/0197/cms701/171009_123325:alcaraz_crab_CRAB3_tutorial_alcaraz/job_out.1.1.txt
======== gWMS-CMSRunAnalysis.sh STARTING at Mon Oct 9 12:45:02 GMT 2017 on b63b00695f.cern.ch ========
Local time : Mon Oct 9 12:45:02 UTC 2017
Current system : Linux b63b00695f.cern.ch 2.6.32-696.10.2.el6.x86_64 #1 SMP Thu Sep 14 16:35:02 CEST 2017 x86_64 x86_64 x86_64 GNU/Linux
Arguments are -a sandbox.tar.gz --sourceURL=https://cmsweb.cern.ch/crabcache --jobNumber=1 --cmsswVersion=CMSSW_8_0_29 --scramArch=slc7_amd64_gcc530 --inputFile=job_input_file_list_1.txt --runAndLumis=job_lumis_1.json --lheInputFiles=False --firstEvent=None --firstLumi=None --lastEvent=None --firstRun=None --seeding=AutomaticSeeding --scriptExe=None --eventsPerLumi=None --maxRuntime=-1 --scriptArgs=[] -o {}
SCRAM_ARCH=slc7_amd64_gcc530
======== HTCONDOR JOB SUMMARY at Mon Oct 9 12:45:02 GMT 2017 START ========
CRAB ID: 1
Execution site: T2_CH_CERN
Current hostname: b63b00695f.cern.ch
Output files: output.root=output_1.root
==== HTCONDOR JOB AD CONTENTS START ====
== JOB AD: CRAB_ParentId = "1"
== JOB AD: ProvisionedResources = "Cpus Memory Disk Swap"
== JOB AD: CRAB_PrimaryDataset = "TT_TuneCUETP8M2T4_13TeV-powheg-pythia8"
== JOB AD: CumulativeRemoteUserCpu = 0.0
== JOB AD: RequestMemory = 2000
== JOB AD: CMS_ALLOW_OVERFLOW = "True"
== JOB AD: TransferOutput = "jobReport.json.1,WMArchiveReport.json.1"
== JOB AD: JobStatus = 2
== JOB AD: CRAB_TaskEndTime = 1510144405
== JOB AD: CoreSize = -1
== JOB AD: JOB_GLIDEIN_SiteWMS = "HTCondor"
== JOB AD: DESIRED_Archs = "X86_64"
== JOB AD: JOB_GLIDECLIENT_Name = "CMSG-v1_0.main"
== JOB AD: JOB_GLIDEIN_Entry_Name = "CMSHTPC_T2_CH_CERN_ce506"
== JOB AD: Used_Gatekeeper = "ce506.cern.ch ce506.cern.ch:9619"
== JOB AD: EncryptExecuteDirectory = false
== JOB AD: JOB_Site = "CERN"
== JOB AD: StartdPrincipal = "execute-side@matchsession/188.185.61.88"
== JOB AD: CRAB_SiteBlacklist = { }
== JOB AD: JOB_GLIDEIN_Schedd = "schedd_glideins9@glidein.grid.iu.edu"
== JOB AD: TargetType = "Machine"
== JOB AD: NiceUser = false
== JOB AD: ShadowBday = 1507553096
== JOB AD: DESIRED_Overflow_Region = strcat(ifthenelse(OVERFLOW_US =?= "True","US","none"),",",ifthenelse(OVERFLOW_IT =?= "True","IT","none"),",",ifthenelse(OVERFLOW_UK =?= "True","UK","none"))
== JOB AD: TransferIn = false
== JOB AD: NumCkpts_RAW = 0
== JOB AD: ExecutableSize = 10
== JOB AD: JobRunCount = 1
== JOB AD: CRAB_oneEventMode = 0
== JOB AD: JOB_GLIDEIN_MaxMemMBs = "20240"
== JOB AD: CRAB_RestURInoAPI = "/crabserver/prod"
== JOB AD: JOB_GLIDEIN_ClusterId = "4704751"
== JOB AD: CommittedSlotTime = 0
== JOB AD: PeriodicRelease = ( HoldReasonCode == 28 ) || ( HoldReasonCode == 30 ) || ( HoldReasonCode == 13 ) || ( HoldReasonCode == 6 )
== JOB AD: User = "cms701@cms"
== JOB AD: CRAB_RestHost = "cmsweb.cern.ch"
== JOB AD: DiskProvisioned = 1833784
== JOB AD: ExecutableSize_RAW = 9
== JOB AD: CRAB_OutLFNDir = "/store/user/alcaraz/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/CRAB3_tutorial_alcaraz/171009_123325"
== JOB AD: x509UserProxyFirstFQAN = "/cms/Role=NULL/Capability=NULL"
== JOB AD: NumRestarts = 0
== JOB AD: JOB_GLIDEIN_Site = "CERN"
== JOB AD: TotalSubmitProcs = 1
== JOB AD: LastJobLeaseRenewal = 1507553097
== JOB AD: SubmitEventNotes = "DAG Node: Job1"
== JOB AD: DAGParentNodeNames = ""
== JOB AD: CRAB_RetryOnASOFailures = 1
== JOB AD: JobLeaseDuration = 2400
== JOB AD: CRAB_ASODB = "filetransfers"
== JOB AD: Requirements = ( ( ( target.IS_GLIDEIN =!= true ) || ( target.GLIDEIN_CMSSite =!= undefined ) ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer )
== JOB AD: ShadowIpAddr = "<188.185.80.49:4080?addrs=188.185.80.49-4080&noUDP&sock=1453309_69b3_491430>"
== JOB AD: LocalSysCpu = 0.0
== JOB AD: x509UserProxyEmail = "Juan.Alcaraz@cern.ch"
== JOB AD: Arguments = "-a sandbox.tar.gz --sourceURL=https://cmsweb.cern.ch/crabcache --jobNumber=1 --cmsswVersion=CMSSW_8_0_29 --scramArch=slc7_amd64_gcc530 --inputFile=job_input_file_list_1.txt --runAndLumis=job_lumis_1.json --lheInputFiles=False --firstEvent=None --firstLumi=None --lastEvent=None --firstRun=None --seeding=AutomaticSeeding --scriptExe=None --eventsPerLumi=None --maxRuntime=-1 --scriptArgs=[] -o {}"
== JOB AD: PeriodicRemove = ( ( JobStatus =?= 5 ) && ( time() - EnteredCurrentStatus > 7 * 60 ) ) || ( ( JobStatus =?= 1 ) && ( time() - EnteredCurrentStatus > 7 * 24 * 60 * 60 ) ) || ( ( JobStatus =?= 2 ) && ( ( MemoryUsage > RequestMemory ) || ( MaxWallTimeMins * 60 < time() - EnteredCurrentStatus ) || ( DiskUsage > 20000000 ) ) ) || ( time() > CRAB_TaskEndTime ) || ( ( JobStatus =?= 1 ) && ( time() > ( x509UserProxyExpiration + 86400 ) ) )
== JOB AD: MachineAttrTotalSlotCpus0 = 8
== JOB AD: JobStartDate = 1507553096
== JOB AD: RemoteHost = "slot1_6@glidein_105_46171536@b63b00695f.cern.ch"
== JOB AD: OrigIwd = "/data/srv/glidecondor/condor_local/spool/2188/0/cluster10942188.proc0.subproc0"
== JOB AD: CRAB_UserGroup = undefined
== JOB AD: PostJobPrio1 = -1507552456
== JOB AD: EstimatedWallTimeMins = 1250
== JOB AD: Err = "_condor_stderr"
== JOB AD: PostJobPrio2 = 1
== JOB AD: ShadowVersion = "$CondorVersion: 8.6.3 May 08 2017 BuildID: 404928 $"
== JOB AD: NumSystemHolds = 0
== JOB AD: GlobalJobId = "crab3@vocms0197.cern.ch#10942201.0#1507552808"
== JOB AD: JOBGLIDEIN_CMSSite = "T2_CH_CERN"
== JOB AD: PublicClaimId = "<188.185.61.88:35714>#1507465587#162#..."
== JOB AD: Environment = "CRAB_TASKMANAGER_TARBALL=local SCRAM_ARCH=slc7_amd64_gcc530 CRAB_RUNTIME_TARBALL=local"
== JOB AD: CRAB_UserDN = "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=alcaraz/CN=369666/CN=Juan Alcaraz Maestre"
== JOB AD: RequestMemory_RAW = 2000
== JOB AD: PeriodicHold = false
== JOB AD: ProcId = 0
== JOB AD: CRAB_PublishName = "CRAB3_tutorial_alcaraz-00000000000000000000000000000000"
== JOB AD: StartdIpAddr = "<188.185.61.88:35714?CCBID=188.184.83.197:9630%3faddrs%3d188.184.83.197-9630#3752789%20131.225.205.232:9630%3faddrs%3d131.225.205.232-9630#2612650&addrs=188.185.61.88-35714+[--1]-35714&noUDP>"
== JOB AD: x509UserProxyVOName = "cms"
== JOB AD: OnExitHold = false
== JOB AD: x509userproxysubject = "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=alcaraz/CN=369666/CN=Juan Alcaraz Maestre"
== JOB AD: CRAB_DataBlock = "/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM#a66cfe2e-be9b-11e6-aa2c-001e67abf518"
== JOB AD: TotalSuspensions = 0
== JOB AD: LeaveJobInQueue = false
== JOB AD: CMSGroups = "/cms,/cms/escms,T2_ES_CIEMAT"
== JOB AD: CRAB_ISB = "https://cmsweb.cern.ch/crabcache"
== JOB AD: OrigMaxHosts = 1
== JOB AD: CRAB_AsyncDest = "T2_ES_CIEMAT"
== JOB AD: NumCkpts = 0
== JOB AD: DAGNodeName = "Job1"
== JOB AD: Out = "_condor_stdout"
== JOB AD: NumJobCompletions = 0
== JOB AD: CRAB_UserHN = "alcaraz"
== JOB AD: AcctGroupUser = "alcaraz"
== JOB AD: JobPrio = 10
== JOB AD: CRAB_TaskLifetimeDays = 30
== JOB AD: CRAB_PublishGroupName = 0
== JOB AD: WantRemoteIO = true
== JOB AD: RootDir = "/"
== JOB AD: DESIRED_CMSDataset = "/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/RunIISummer16MiniAODv2-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/MINIAODSIM"
== JOB AD: WantCheckpoint = false
== JOB AD: OrigCmd = "/data/srv/glidecondor/condor_local/spool/2188/0/cluster10942188.proc0.subproc0/gWMS-CMSRunAnalysis.sh"
== JOB AD: JOB_GLIDEIN_Factory = "OSGGOC"
== JOB AD: MachineAttrMJF_JOB_HS06_JOB0 = "Unknown"
== JOB AD: CpusProvisioned = 1
== JOB AD: RequestDisk_RAW = 1
== JOB AD: JOB_GLIDEIN_Memory = "2000"
== JOB AD: WhenToTransferOutput = "ON_EXIT_OR_EVICT"
== JOB AD: CRAB_AdditionalOutputFiles = { }
== JOB AD: ExitStatus = 0
== JOB AD: MachineAttrDIRACBenchmark0 = 7.54536218448
== JOB AD: CurrentHosts = 1
== JOB AD: BufferSize = 524288
== JOB AD: CumulativeRemoteSysCpu = 0.0
== JOB AD: CRAB_PublishDBSURL = "https://cmsweb.cern.ch/dbs/prod/phys03/DBSWriter"
== JOB AD: NumJobStarts = 0
== JOB AD: CRAB_OutTempLFNDir = "/store/temp/user/alcaraz.daea2bb00e3d819f9d246f95d78d68457d50e8cd/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/CRAB3_tutorial_alcaraz/171009_123325"
== JOB AD: LastSuspensionTime = 0
== JOB AD: MaxHosts = 1
== JOB AD: CRAB_NumAutomJobRetries = 2
== JOB AD: OVERFLOW_IT = ifthenelse(regexp("T[1,2]_IT_",DESIRED_Sites),"True",undefined)
== JOB AD: MinHosts = 1
== JOB AD: JOB_GLIDEIN_SiteWMS_Slot = "slot1_1@b63b00695f.cern.ch"
== JOB AD: CRAB_JobSW = "CMSSW_8_0_29"
== JOB AD: UidDomain = "cms"
== JOB AD: Owner = "cms701"
== JOB AD: DelegatedProxyExpiration = 1507639497
== JOB AD: ShouldTransferFiles = "YES"
== JOB AD: CRAB_JobType = "analysis"
== JOB AD: CRAB_SaveLogsFlag = 1
== JOB AD: ExitBySignal = false
== JOB AD: JobAdInformationAttrs = "MATCH_EXP_JOBGLIDEIN_CMSSite, JOBGLIDEIN_CMSSite, RemoteSysCpu, RemoteUserCpu"
== JOB AD: WantRemoteSyscalls = false
== JOB AD: CRAB_ASOTimeout = 86400
== JOB AD: CompletionDate = 0
== JOB AD: TransferInput = "CMSRunAnalysis.sh,cmscp.py,CMSRunAnalysis.tar.gz,sandbox.tar.gz,run_and_lumis.tar.gz,input_files.tar.gz"
== JOB AD: DESIRED_OpSysMajorVers = "5,6"
== JOB AD: CumulativeSuspensionTime = 0
== JOB AD: JOB_GLIDEIN_ToDie = "1507700793"
== JOB AD: JOB_GLIDEIN_ProcId = "8"
== JOB AD: In = "/dev/null"
== JOB AD: RemoteSlotID = 1
== JOB AD: CRAB_UserRole = undefined
== JOB AD: MyType = "Job"
== JOB AD: DiskUsage_RAW = 1646
== JOB AD: JOB_GLIDEIN_Name = "gfactory_instance"
== JOB AD: CRAB_Publish = 0
== JOB AD: JOB_GLIDEIN_Max_Walltime = "257400"
== JOB AD: CRAB_Workflow = "171009_123325:alcaraz_crab_CRAB3_tutorial_alcaraz"
== JOB AD: MaxWallTimeMins = ( JobStatus =?= 1 ) ? EstimatedWallTimeMins : 1250
== JOB AD: JOB_GLIDEIN_SEs = "srm-eoscms.cern.ch"
== JOB AD: CRAB_EDMOutputFiles = { "output.root" }
== JOB AD: BufferBlockSize = 32768
== JOB AD: MemoryProvisioned = 2000
== JOB AD: TransferInputSizeMB = 1
== JOB AD: CRAB_DBSURL = "https://cmsweb.cern.ch/dbs/prod/global/DBSReader"
== JOB AD: TransferSocket = "<188.185.80.49:4080?addrs=188.185.80.49-4080&noUDP&sock=1453309_69b3_491430>"
== JOB AD: JOB_GLIDEIN_SiteWMS_Queue = "ce506.cern.ch"
== JOB AD: StreamErr = false
== JOB AD: MyAddress = "<188.185.80.49:4080?addrs=188.185.80.49-4080&noUDP&sock=1453309_69b3_491430>"
== JOB AD: PeriodicRemoveReason = ifThenElse(time() - EnteredCurrentStatus > 7 * 24 * 60 * 60 && isUndefined(MemoryUsage),"Removed due to idle time limit",ifThenElse(time() > x509UserProxyExpiration,"Removed job due to proxy expiration",ifThenElse(MemoryUsage > RequestMemory,"Removed due to memory use",ifThenElse(MaxWallTimeMins * 60 < time() - EnteredCurrentStatus,"Removed due to wall clock limit",ifThenElse(DiskUsage > 20000000,"Removed due to disk usage",ifThenElse(time() > CRAB_TaskEndTime,"Removed due to reached CRAB_TaskEndTime","Removed due to job being held"))))))
== JOB AD: OVERFLOW_CHECK = ifthenelse(MATCH_GLIDEIN_CMSSite =!= undefined,ifthenelse(stringListMember(MATCH_GLIDEIN_CMSSite,DESIRED_Sites),false,true),false)
== JOB AD: CommittedTime = 0
== JOB AD: RequestDisk = 100000
== JOB AD: AcctGroup = "analysis"
== JOB AD: LocalUserCpu = 0.0
== JOB AD: CRAB_localOutputFiles = "output.root=output_1.root"
== JOB AD: LastRejMatchReason = "no match found "
== JOB AD: NumJobMatches = 1
== JOB AD: CRAB_JobArch = "slc7_amd64_gcc530"
== JOB AD: MachineAttrCpus0 = 1
== JOB AD: DAGManJobId = 10942188
== JOB AD: DAGManNodesMask = "0,1,2,4,5,7,9,10,11,12,13,16,17,24,27"
== JOB AD: MachineAttrSlotWeight0 = 1
== JOB AD: RemoteUserCpu = 0.0
== JOB AD: LastJobStatus = 1
== JOB AD: UserLog = "/data/srv/glidecondor/condor_local/spool/2188/0/cluster10942188.proc0.subproc0/job_log"
== JOB AD: ImageSize = 10
== JOB AD: JOB_CMSSite = "T2_CH_CERN"
== JOB AD: DESIRED_SITES = "T2_ES_CIEMAT,T2_FR_IPHC,T2_RU_IHEP,T2_CH_CERN_HLT,T2_DE_DESY,T2_US_MIT,T2_IT_Legnaro,T2_CH_CERN"
== JOB AD: OVERFLOW_UK = ifthenelse(regexp("T2_UK_London_",DESIRED_Sites),"True",undefined)
== JOB AD: TaskType = "Job"
== JOB AD: Iwd = "/pool/condor/dir_15483/glide_NaNFwT/execute/dir_20651"
== JOB AD: CRAB_ASOURL = "https://cmsweb.cern.ch/crabserver/prod"
== JOB AD: ImageSize_RAW = 9
== JOB AD: DAGManNodesLog = "/data/srv/glidecondor/condor_local/spool/2188/0/cluster10942188.proc0.subproc0/RunJobs.dag.nodes.log"
== JOB AD: StreamOut = false
== JOB AD: JobUniverse = 5
== JOB AD: OVERFLOW_US = ifthenelse(regexp("T[1,2]_US_",DESIRED_Sites),"True",undefined)
== JOB AD: QDate = 1507552808
== JOB AD: SpoolOnEvict = false
== JOB AD: EnteredCurrentStatus = 1507553096
== JOB AD: CRAB_ReqName = "171009_123325:alcaraz_crab_CRAB3_tutorial_alcaraz"
== JOB AD: DESIRED_CMSDataLocations = "T2_ES_CIEMAT,T2_FR_IPHC,T2_RU_IHEP,T2_CH_CERN_HLT,T2_DE_DESY,T2_CH_CSCS,T2_US_MIT,T2_IT_Legnaro,T2_CH_CERN,T2_CH_CSCS_HPC"
== JOB AD: CRAB_SiteWhitelist = { }
== JOB AD: x509UserProxyExpiration = 1508157220
== JOB AD: NumShadowStarts = 1
== JOB AD: CommittedSuspensionTime = 0
== JOB AD: LastMatchTime = 1507553096
== JOB AD: LastRejMatchTime = 1507553026
== JOB AD: PreJobPrio1 = 0
== JOB AD: JOB_GLIDEIN_SiteWMS_JobId = "7267155.0"
== JOB AD: JobNotification = 0
== JOB AD: x509userproxy = "/data/srv/glidecondor/condor_local/spool/2188/0/cluster10942188.proc0.subproc0/1362f2fd2a4fa05821e699f5730fc8a8c0b313f3"
== JOB AD: CRAB_TFileOutputFiles = { }
== JOB AD: JobBatchName = "RunJobs.dag+10942188"
== JOB AD: CumulativeSlotTime = 0
== JOB AD: CRAB_TransferOutputs = 1
== JOB AD: RemoteSysCpu = 0.0
== JOB AD: CRAB_SubmitterIpAddr = "128.141.188.185"
== JOB AD: JOB_GLIDEIN_CMSSite = "T2_CH_CERN"
== JOB AD: CondorPlatform = "$CondorPlatform: x86_64_RedHat6 $"
== JOB AD: OnExitRemove = true
== JOB AD: Rank = 0.0
== JOB AD: x509UserProxyFQAN = "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=alcaraz/CN=369666/CN=Juan Alcaraz Maestre,/cms/Role=NULL/Capability=NULL,/cms/escms/Role=NULL/Capability=NULL"
== JOB AD: RemoteWallClockTime = 0.0
== JOB AD: CRAB_Id = "1"
== JOB AD: JOB_GLIDEIN_ToRetire = "1507665993"
== JOB AD: Cmd = "/data/srv/glidecondor/condor_local/spool/2188/0/cluster10942188.proc0.subproc0/gWMS-CMSRunAnalysis.sh"
== JOB AD: MachineAttrHAS_SINGULARITY0 = true
== JOB AD: JOB_Gatekeeper = ifthenelse(substr(Used_Gatekeeper,0,1) =!= "$",Used_Gatekeeper,ifthenelse(MATCH_GLIDEIN_Gatekeeper =!= undefined,MATCH_GLIDEIN_Gatekeeper,"Unknown"))
== JOB AD: AccountingGroup = "analysis.alcaraz"
== JOB AD: JobCurrentStartDate = 1507553096
== JOB AD: use_x509userproxy = true
== JOB AD: DESIRED_OpSyses = "LINUX"
== JOB AD: DiskUsage = 1750
== JOB AD: CRAB_Retry = 1
== JOB AD: CRAB_Destination = "srm://srm.ciemat.es:8443/srm/managerv2?SFN=/pnfs/ciemat.es/data/cms/store/user/alcaraz/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/CRAB3_tutorial_alcaraz/171009_123325/0000/log/cmsRun_1.log.tar.gz, srm://srm.ciemat.es:8443/srm/managerv2?SFN=/pnfs/ciemat.es/data/cms/store/user/alcaraz/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/CRAB3_tutorial_alcaraz/171009_123325/0000/output_1.root"
== JOB AD: ClusterId = 10942201
== JOB AD: CRAB_StageoutPolicy = "local,remote"
== JOB AD: RequestCpus = 1
== JOB AD: CondorVersion = "$CondorVersion: 8.6.3 May 08 2017 BuildID: 404928 $"
== JOB AD: CRAB_TaskWorker = "vocms052"
== JOB AD: JOB_GLIDEIN_Job_Max_Time = "34800"
== JOB AD: accounting_group = analysis
==== HTCONDOR JOB AD CONTENTS FINISH ====
======== HTCONDOR JOB SUMMARY at Mon Oct 9 12:45:02 GMT 2017 FINISH ========
======== PROXY INFORMATION START at Mon Oct 9 12:45:02 GMT 2017 ========
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=alcaraz/CN=369666/CN=Juan Alcaraz Maestre/CN=489298966/CN=2056867253/CN=2080018538/CN=292515173/CN=7974757/CN=1582053182
issuer : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=alcaraz/CN=369666/CN=Juan Alcaraz Maestre/CN=489298966/CN=2056867253/CN=2080018538/CN=292515173/CN=7974757
identity : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=alcaraz/CN=369666/CN=Juan Alcaraz Maestre/CN=489298966/CN=2056867253/CN=2080018538/CN=292515173/CN=7974757
type : RFC compliant proxy
strength : 1024 bits
path : /srv/1362f2fd2a4fa05821e699f5730fc8a8c0b313f3
timeleft : 23:59:57
key usage : Digital Signature, Key Encipherment
=== VO cms extension information ===
VO : cms
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=alcaraz/CN=369666/CN=Juan Alcaraz Maestre
issuer : /DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch
attribute : /cms/Role=NULL/Capability=NULL
attribute : /cms/escms/Role=NULL/Capability=NULL
timeleft : 167:48:39
uri : lcg-voms2.cern.ch:15002
======== PROXY INFORMATION FINISH at Mon Oct 9 12:45:02 GMT 2017 ========
======== CMSRunAnalysis.sh at Mon Oct 9 12:45:02 GMT 2017 STARTING ========
======== CMSRunAnalysis.sh STARTING at Mon Oct 9 12:45:02 GMT 2017 ========
Local time : Mon Oct 9 12:45:02 UTC 2017
Current system : Linux b63b00695f.cern.ch 2.6.32-696.10.2.el6.x86_64 #1 SMP Thu Sep 14 16:35:02 CEST 2017 x86_64 x86_64 x86_64 GNU/Linux
==== CMSSW pre-execution environment bootstrap STARTING ====
+ '[' -f /cvmfs/cms.cern.ch/cmsset_default.sh ']'
+ echo 'LCG style'
LCG style
+ set +x
+ declare -a VERSIONS
+ VERSIONS=($(ls $VO_CMS_SW_DIR/$SCRAM_ARCH/external/python | egrep '2.[67]'))
++ ls /cvmfs/cms.cern.ch/slc7_amd64_gcc530/external/python
++ egrep '2.[67]'
+ PY_PATH=/cvmfs/cms.cern.ch/slc7_amd64_gcc530/external/python
+ echo 'python version: ' 2.7.11
python version: 2.7.11
+ set +x
==== CMSSW pre-execution environment bootstrap FINISHING at Mon Oct 9 12:45:02 GMT 2017 ====
==== Python discovery STARTING ====
Python found in /cvmfs/cms.cern.ch/slc7_amd64_gcc530/external/python/2.7.11
I found python at..
/cvmfs/cms.cern.ch/slc7_amd64_gcc530/external/python/2.7.11/bin/python
python: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /cvmfs/cms.cern.ch/slc7_amd64_gcc530/external/python/2.7.11/lib/libpython2.7.so.1.0)
Error: python is not functional.
real 0m2.055s
user 0m0.019s
sys 0m0.031s
CMSRunAnalysis.sh complete at Mon Oct 9 12:45:04 GMT 2017 with (short) exit status 59
======== CMSRunAnalsysis.sh at Mon Oct 9 12:45:04 GMT 2017 FINISHING ========
======== python2.6 bootstrap for stageout at Mon Oct 9 12:45:04 GMT 2017 STARTING ========
+ '[' -f /cvmfs/cms.cern.ch/cmsset_default.sh ']'
+ set +x
+ '[' -e /cvmfs/cms.cern.ch/COMP/slc6_amd64_gcc493/external/python/2.7.6/etc/profile.d/init.sh ']'
+ set +x
+ command -v python2.7
+ rc=0
+ set +x
Found python2.7 at:
/cvmfs/cms.cern.ch/COMP/slc6_amd64_gcc493/external/python/2.7.6/bin/python2.7
======== python2.7 bootstrap for stageout at Mon Oct 9 12:45:04 GMT 2017 FINISHING ========
======== Attempting to notify HTCondor of file stageout ========
Error: 22 (Invalid argument)
======== Stageout at Mon Oct 9 12:45:04 GMT 2017 STARTING ========
Traceback (most recent call last):
File "cmscp.py", line 24, in <module>
from ServerUtilities import cmd_exist, parseJobAd, TRANSFERDB_STATES, isCouchDBURL
ImportError: No module named ServerUtilities
======== ERROR: Unable to initialize WMCore at Mon Oct 9 12:45:05 GMT 2017 ========
======== Figuring out long exit code of the job for condor_chirp ========
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
File "/cvmfs/cms.cern.ch/COMP/slc6_amd64_gcc493/external/python/2.7.6/lib/python2.7/json/__init__.py", line 290, in load
**kw)
File "/cvmfs/cms.cern.ch/COMP/slc6_amd64_gcc493/external/python/2.7.6/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/cvmfs/cms.cern.ch/COMP/slc6_amd64_gcc493/external/python/2.7.6/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/cvmfs/cms.cern.ch/COMP/slc6_amd64_gcc493/external/python/2.7.6/lib/python2.7/json/decoder.py", line 383, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
==== Failed to load the long exit code from jobReport.json.1. Falling back to short exit code ====
==== Short exit code of the job is 10043 ====
======== Finished condor_chirp -ing the exit code of the job. Exit code of condor_chirp: 0 ========
Job Running time in seconds: 3
Job runtime is less than 1 minute. Sleeping 57
looking at that same job, in schedd spool directory Job.submit file has
REQUIRED_OS="rhel7"
mistery deepens
Simply that REQUIRED_OS is a custom classAd, not an HTC internal one, so it needs the +
in front to work:
+REQUIRED_OS="rhel7"
as Diego D. just confirmed. I will run a test and then make a PR
damn.. Attilio just ran a task on preprod which has this patch now, and jobs were sent with
REQUIRED_OS = "any"
while being SL6 jobs, landed on S7 and fail big time. Indeed I had only tested that SL7 jobs were handled properly.
amazing, but true, the mistake in the original commit from @bbockelm https://github.com/bbockelm/CRABServer/blob/8248115c7eb44bbe3352390bbdd8239b62734071/src/python/TaskWorker/Actions/DagmanCreator.py#L470-L475
there's an if
where else if
was needed.
bad for us not having tested :-( thanks Brian for the change to reaffirm not to blindly trust anything.
lines to patch are now https://github.com/dmwm/CRABServer/blob/master/src/python/TaskWorker/Actions/DagmanCreator.py#L491-L496
I do not trust myself to edit them correctly at this hour. Unless @mmascher wants to do it, I'll address this tomorrow
for the record, the PR which brought in the bad code is: https://github.com/dmwm/CRABServer/pull/5460
Good catch!
Indeed, it looks as straightforward as replacing if
with else if
@mmascher AFAIU this is fixded in production. can you add a reference to final PR and close ?
fixed via #5590
currently CRAB jobs land on SL6 nodes even if user submitted on an SL7 release of CMSRUN. Of course they fail. See [1] where the relevant bit is that job wrapper reports this classAd
== JOB AD: DESIRED_OpSysMajorVers = "5,6"
Currently the OpSys request is done in here and for SL7 indeed it falls back to requestin 5+6 https://github.com/dmwm/CRABServer/blob/master/src/python/TaskWorker/Actions/DagmanCreator.py#L391-L410
I am contacting Submission Infrastrucuture to make sure what is the currently prescrived way to indicate the operating system, then we need to change our code.
[1] https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/3248/1/3/2/1/1/1/1/2/1/4.html