dmwm / CRAB2

CRAB2
2 stars 11 forks source link

cmscp put /temp in LFN even if stageout worked and fallback was not used #863

Closed ericvaandering closed 10 years ago

ericvaandering commented 10 years ago

Original Savannah ticket 99733 reported by fanzago on Thu Jan 10 09:49:41 2013.

see: https://hypernews.cern.ch/HyperNews/CMS/get/crabFeedback/6438/3/1/1/1/1.html

there seems to be 3 problems:

  1. a message about stageout failing and exti code set to 60307, but no detail reported
  2. another message later saying it is all ok and exit code is 0
  3. no evidence for local fallback stageout to have been called and yet /temp was insrted in LFN and exit code was not set to 60318

excerpt from user's stdout:

>>> current directory (RUNTIME_AREA): /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579 SE = storage01.lcg.cscs.ch SE_PATH = /srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/mdunser/DoubleMu/V03-08-04_DoubleMu-Run2012D-PromptReco-v1-topUpV2/7845257fe5d435c2a9129285327a71b4/ LFNBaseName = /store/user/mdunser/DoubleMu/V03-08-04_DoubleMu-Run2012D-PromptReco-v1-topUpV2/7845257fe5d435c2a9129285327a71b4/ USER = mdunser endpoint = srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/mdunser/DoubleMu/V03-08-04_DoubleMu-Run2012D-PromptReco-v1-topUpV2/7845257fe5d435c2a9129285327a71b4/ >>> Copy output files from WN = r6b-s41.ufhpc to /srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/mdunser/DoubleMu/V03-08-04_DoubleMu-Run2012D-PromptReco-v1-topUpV2/7845257fe5d435c2a9129285327a71b4/ : python cmscp.py --destination srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/mdunser/DoubleMu/V03-08-04_DoubleMu-Run2012D-PromptReco-v1-topUpV2/7845257fe5d435c2a9129285327a71b4/ --inputFileList /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cms_Z0UuK1G355us/CMSSW_5_3_6/NTupleProducer_53X_data_64_1_n5a.root --middleware OSG --se_name storage01.lcg.cscs.ch --for_lfn /store/user/mdunser/DoubleMu/V03-08-04_DoubleMu-Run2012D-PromptReco-v1-topUpV2/7845257fe5d435c2a9129285327a71b4/
{"/local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cms_Z0UuK1G355us/CMSSW_5_3_6/NTupleProducer_53X_data_64_1_n5a.root": {"endpoint": "srm://storage01.lcg.cscs.ch:8443/srm/managerv2?SFN=/pnfs/lcg.cscs.ch/cms/trivcat/store/user/mdunser/DoubleMu/V03-08-04_DoubleMu-Run2012D-PromptReco-v1-topUpV2/7845257fe5d435c2a9129285327a71b4/", "surl_for_grid": "", "reason": "Problem copying /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cms_Z0UuK1G355us/CMSSW_5_3_6/NTupleProducer_53X_data_64_1_n5a.root file'Timeout interrupt for too long execution [timeout = 3600].'", "erCode": "60317", "se_name": "storage01.lcg.cscs.ch", "for_lfn": "/copy_problem/"}}/local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579 -------- cat /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cmscpReport.sh

!/bin/bash

echo "Report for File: /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cms_Z0UuK1G355us/CMSSW_5_3_6/NTupleProducer_53X_data_64_1_n5a.root" echo "LFN: /copy_problem/NTupleProducer_53X_data_64_1_n5a.root" echo "StorageElement: storage01.lcg.cscs.ch" echo "StageOutExitStatusReason = 'Problem copying /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cms_Z0UuK1G355us/CMSSW_5_3_6/NTupleProducer_53X_data_64_1_n5a.root file Timeout interrupt for too long execution [timeout = 3600]. '" | tee -a $RUNTIME_AREA/$repo echo "StageOutSE = storage01.lcg.cscs.ch" >> $RUNTIME_AREA/$repo

export StageOutExitStatus=60317 echo "StageOutExitStatus = 60317" | tee -a $RUNTIME_AREA/$repo -------- end of /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cmscpReport.sh Report for File: /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cms_Z0UuK1G355us/CMSSW_5_3_6/NTupleProducer_53X_data_64_1_n5a.root LFN: /copy_problem/NTupleProducer_53X_data_64_1_n5a.root StorageElement: storage01.lcg.cscs.ch StageOutExitStatusReason = 'Problem copying /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cms_Z0UuK1G355us/CMSSW_5_3_6/NTupleProducer_53X_data_64_1_n5a.root file Timeout interrupt for too long execution [timeout = 3600]. ' StageOutExitStatus = 60317 INFO DN /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579 Report for File: /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cms_Z0UuK1G355us/CMSSW_5_3_6/NTupleProducer_53X_data_64_1_n5a.root LFN: /store/temp/user/mdunser/DoubleMu/V03-08-04_DoubleMu-Run2012D-PromptReco-v1-topUpV2/7845257fe5d435c2a9129285327a71b4/NTupleProducer_53X_data_64_1_n5a.root StorageElement: srm.ihepa.ufl.edu StageOutExitStatusReason = 'Copy succedeed with srm-lcg utils' StageOutExitStatus = 0 /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579 Report for File: /local/scratch/26927979.torx.ufhpc/glide_BtJuyd/execute/dir_7579/cms_Z0UuK1G355us/CMSSW_5_3_6/NTupleProducer_53X_data_64_1_n5a.root LFN: /store/temp/user/mdunser/DoubleMu/V03-08-04_DoubleMu-Run2012D-PromptReco-v1-topUpV2/7845257fe5d435c2a9129285327a71b4/NTupleProducer_53X_data_64_1_n5a.root StorageElement: storage01.lcg.cscs.ch StageOutExitStatusReason = 'Copy succedeed with srm-lcg utils' StageOutExitStatus = 0

belforte commented 10 years ago

seems it is not causing many problems, let's forget