dmwm / PHEDEX

CMS data-placement suite
8 stars 18 forks source link

FileStager alert about failing to parse stager status command output #922

Closed ericvaandering closed 10 years ago

ericvaandering commented 10 years ago

Original Savannah ticket 99391 reported by magini on Tue Dec 11 12:02:34 2012.

Hi,

an alert very rarely seen in the FileStager agent log output:

+verbatim+ 2012-12-10 17:09:23: FileStager[9286]: alert: /data/DebugNodes/PHEDEX/Custom/Template/castor_prestage-status.pm output unrecognised file /castor/cern.ch/cms/store/PhED/castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_263 [...] 2012-12-10 17:09:24: FileStager[9286]: alert: /data/DebugNodes/PHEDEX/Custom/Template/castor_prestage-status.pm output unrecognised file Ex_LoadTest07_4/LoadTest07_CERN_61 -verbatim-

The agent is unable to parse the output of the status commands for certain files, because the lines of output of castor_prestage-status.pm for those two files are mixed up.

N.

ericvaandering commented 10 years ago

Comment by magini on Tue Dec 11 12:50:20 2012

Hi,

after performing a few manual tests, the output of castor_prestage-status.pm when executed standalone looks fine.

However, the output becomes mixed up when castor_prestage-status.pm is executed through PHEDEX/Utilities/RunWithTimeout (as done by the agent)

myfiles = (a large list of files...) +verbatim+ > PHEDEX/Utilities/RunWithTimeout 100 PHEDEX/Custom/Template/castor_prestage-status.pm $myfiles [...] /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_20f /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_210 /cast/castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_1d4 /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_1d5 /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_1d6 /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_1d7 [...] /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_33 /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_34 /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_C0 /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_C1 /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_or/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_211 /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_212 /castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_213 [...] -verbatim-

Since RunWithTimeout uses PHEDEX::Core::JobManager, this could be caused by the same underlying issue as #98594 affecting the FileDownload agent.

Cheers Nicolo'

ericvaandering commented 10 years ago

Comment by magini on Wed Dec 12 08:10:35 2012

Hi,

this issue disappeared when I removed an extra debugging "print" I had added to PHEDEX::Core::JobManager::_child_stdout, so it doesn't affect the released versions of PHEDEX.

Cheers Nicolo'

ericvaandering commented 10 years ago

Closed by magini on Thu Dec 13 06:34:50 2012

ericvaandering commented 10 years ago

Comment by magini on Thu Dec 13 06:34:50 2012

Hi,

closing as invalid.

Cheers Nicolo'