Closed meahoibm closed 3 years ago
Looking at some new errors: Job 30928: cmd: /opt/ibm//bb/bin/bbcmd --jobstepid=1 --target=0- getfileinfo Job 30928: json: {"id":"1","rc":"-2","0":{"id":"1","rc":"-2","in":{"apicall":"Coral_SetVar","misc":{"uid":"0","gid":"0"}},"f5n05_pvt_pok_stglabs_ibm_com:bb_api559588":{"breadcrumbs":{"bbproxy":{"msgin_setvar":{"exit":{"count":"1","ts":"1605276518.062115"}}}}},"error":{"text":"Connection closed waiting for the reply","func":"BB_GetFileInfo","line":"1499","sourcefile":"\/home\/build\/bb\/src\/bbapi.cc"}},"goodcount":"0","failcount":"1","voidcount":"0","error":{"firstFailRank":"0","firstFailNode":"f5n05","command":"getfileinfo^--bbid^30928^--envs^BBPATH=\/mnt\/bb_4a8b735e2c3112caf29a1f41ee65cfad^--jobstepid=1^--csmcommand=f5n05:0","text":"Connection closed waiting for the reply"}} Job 30928: Job 30928: rc = -2 Job 30928: Command failure. rc=-2 Job 30928: Job 30928: cmd: /opt/ibm//bb/bin/bbcmd --jobstepid=0 --target=0 gettransfers --numhandles=0 --match=BBNOTSTARTED,BBINPROGRESS,BBPARTIALSUCCESS Job 30928: json: {"id":"1","rc":"-1","error":{"csm_stderrgrabrc":"-11","csm_hostlist":"f5n05","csm_rc":"-1","command":"gettransfers^--bbid^30928^--envs^BBPATH=\/mnt\/bb_4a8b735e2c3112caf29a1f41ee65cfad^--jobstepid=0^--matchstatus=BBNOTSTARTED,BBINPROGRESS,BBPARTIALSUCCESS^--numhandles=0^--csmcommand=f5n05:0","text":"no data from node"},"goodcount":"0","failcount":"0","voidcount":"1"} Job 30928: Job 30928: rc = -1 Job 30928: Command failure. rc=-1 Job 30928: cmd: /ESS/gpfst/IST_LSF/10.1.0.10/linux3.10-glibc2.17-ppc64le-csm/bin/bpost -d 'BB: Stage-in admin script completed' -i 120 30928 Job 30928: command rc: 0
Unit testing on f5n05 and ln01. Code impacts running of bbcmd on compute node.
Previous errors of "no data" have gone away.
Materials from a bsub run are in: /ESS/gpfst/meaho/workdir/30925