HCALRunControl / levelOneHCALFM

HCAL Function Manager
https://twiki.cern.ch/twiki/bin/view/CMS/HCALFunctionManager
0 stars 6 forks source link

destroyAction broken #401

Closed jhakala closed 6 years ago

jhakala commented 6 years ago

I am getting errors on destroy using HCALFM hash 5187145. After clicking "Destroy" (from the Halted state) I get:

      INFO [HCAL HCALFM_LVL1_904Int] destroyAction called
 2018-04-10 01:14:19 and 318 ms : cms.hcalpro.rcms.fm.app.level1.HCALFunctionManager
      INFO [HCAL HCALFM_LVL1_904Int] Closing current log session 1000021645
 2018-04-10 01:14:19 and 331 ms : cms.hcalpro.rcms.fm.app.level1.HCALFunctionManager
      INFO [HCAL HCALFM_LVL1_904Int] Will destroy FM named: HCAL_Laser904The role is: EvmTrig
And the URI is: http://cms904rc-hcal.cms904:16000/urn:rcms-fm:fullpath=/hcalpro/904Int/DAQ_test/crate52_Oct06-17_v2,group=HCAL_Laser904,owner=hcalpro
 2018-04-10 01:14:19 and 337 ms : cms.hcalpro.rcms.fm.app.level1.HCALFunctionManager
      INFO [HCAL HCAL_Laser904] destroyAction called
 2018-04-10 01:14:19 and 343 ms : cms.hcalpro.rcms.fm.resource.qualifiedresource.FunctionManager
     ERROR Cannot destroy Function Manager with uri=http://cms904rc-hcal.cms904:16000/urn:rcms-fm:fullpath=/hcalpro/904Int/DAQ_test/crate52_Oct06-17_v2,group=HCAL_Laser904,owner=hcalpro
 2018-04-10 01:14:19 and 345 ms : cms.hcalpro.rcms.fm.app.level1.HCALFunctionManager
     ERROR [HCAL HCALFM_LVL1_904Int] Could not destroy FM client named: HCAL_Laser904 The URI is: http://cms904rc-hcal.cms904:16000/urn:rcms-fm:fullpath=/hcalpro/904Int/DAQ_test/crate52_Oct06-17_v2,group=HCAL_Laser904,owner=hcalpro
The exception is:
rcms.fm.fw.service.lifecycle.LifecycleServiceException: Cannot destroy Function Manager with uri=http://cms904rc-hcal.cms904:16000/urn:rcms-fm:fullpath=/hcalpro/904Int/DAQ_test/crate52_Oct06-17_v2,group=HCAL_Laser904,owner=hcalpro; nested exception is:
    rcms.fm.fw.service.lifecycle.LifecycleServiceException: FM Destroy: User destroyAction callback problem; nested exception is:
    rcms.fm.fw.user.UserActionException: User destroyAction problem Message from the caught exception is: Cannot destroy Function Manager with uri=http://cms904rc-hcal.cms904:16000/urn:rcms-fm:fullpath=/hcalpro/904Int/DAQ_test/crate52_Oct06-17_v2,group=HCAL_Laser904,owner=hcalpro; nested exception is:
    rcms.fm.fw.service.lifecycle.LifecycleServiceException: FM Destroy: User destroyAction callback problem; nested exception is:
    rcms.fm.fw.user.UserActionException: User destroyAction problem
 2018-04-10 01:14:19 and 346 ms : cms.hcalpro.rcms.fm.app.level1.HCALFunctionManager
      WARN [HCAL HCALFM_LVL1_904Int] class rcms.fm.app.level1.HCALlevelOneFunctionManager: Failed to send error message [HCAL HCALFM_LVL1_904Int] Could not destroy FM client named: HCAL_Laser904 The URI is: http://cms904rc-hcal.cms904:16000/urn:rcms-fm:fullpath=/hcalpro/904Int/DAQ_test/crate52_Oct06-17_v2,group=HCAL_Laser904,owner=hcalpro
The exception is:
rcms.fm.fw.service.lifecycle.LifecycleServiceException: Cannot destroy Function Manager with uri=http://cms904rc-hcal.cms904:16000/urn:rcms-fm:fullpath=/hcalpro/904Int/DAQ_test/crate52_Oct06-17_v2,group=HCAL_Laser904,owner=hcalpro; nested exception is:
    rcms.fm.fw.service.lifecycle.LifecycleServiceException: FM Destroy: User destroyAction callback problem; nested exception is:
    rcms.fm.fw.user.UserActionException: User destroyAction problem Message from the caught exception is: Cannot destroy Function Manager with uri=http://cms904rc-hcal.cms904:16000/urn:rcms-fm:fullpath=/hcalpro/904Int/DAQ_test/crate52_Oct06-17_v2,group=HCAL_Laser904,owner=hcalpro; nested exception is:
    rcms.fm.fw.service.lifecycle.LifecycleServiceException: FM Destroy: User destroyAction callback problem; nested exception is:
    rcms.fm.fw.user.UserActionException: User destroyAction problem
 2018-04-10 01:14:20 and 161 ms : cms.hcalpro.rcms.fm.app.level1.HCALEventHandler
      INFO [HCAL HCAL_Laser904] ... stopping TriggerAdapter watchdog thread done.

Then after trying to initialize, I get:

 2018-04-10 01:14:33 and 396 ms : cms.hcalpro.rcms.fm.resource.qualifiedresource.XdaqExecutive
      INFO XdaqExecutive::init() jobcontrol found http://hcal904daq01.cms904:9999/urn:xdaq-application:lid=10   rcms.fm.resource.qualifiedresource.JobControl@c9673cd
 2018-04-10 01:14:33 and 396 ms : cms.hcalpro.rcms.fm.resource.qualifiedresource.XdaqExecutive
      INFO XdaqExecutive::init() jobcontrol found http://hcal904daq01.cms904:9999/urn:xdaq-application:lid=10   rcms.fm.resource.qualifiedresource.JobControl@c9673cd
 2018-04-10 01:14:33 and 396 ms : cms.hcalpro.rcms.fm.resource.qualifiedresource.XdaqExecutive
      INFO XdaqExecutive::init() jobcontrol found http://hcal904daq01.cms904:9999/urn:xdaq-application:lid=10   rcms.fm.resource.qualifiedresource.JobControl@c9673cd
 2018-04-10 01:14:33 and 396 ms : cms.hcalpro.rcms.fm.resource.qualifiedresource.XdaqExecutive
      INFO XdaqExecutive::init() jobcontrol found http://hcal904daq01.cms904:9999/urn:xdaq-application:lid=10   rcms.fm.resource.qualifiedresource.JobControl@c9673cd
 2018-04-10 01:14:33 and 408 ms : cms.hcalpro.rcms.fm.resource.qualifiedresource.XdaqExecutive
      WARN XdaqExecutive was already running, trying to kill and start again. Executive URI:http://hcal904daq01.cms904:34001/urn:xdaq-application:lid=0
 2018-04-10 01:14:33 and 409 ms : cms.hcalpro.rcms.fm.resource.qualifiedresource.XdaqExecutive
      WARN XdaqExecutive was already running, trying to kill and start again. Executive URI:http://hcal904daq01.cms904:34002/urn:xdaq-application:lid=0
 2018-04-10 01:14:33 and 423 ms : cms.hcalpro.rcms.fm.resource.qualifiedresource.XdaqExecutive
      WARN XdaqExecutive was already running, trying to kill and start again. Executive URI:http://hcal904daq01.cms904:34003/urn:xdaq-application:lid=0
 2018-04-10 01:14:33 and 424 ms : cms.hcalpro.rcms.fm.resource.qualifiedresource.XdaqExecutive
      WARN XdaqExecutive was already running, trying to kill and start again. Executive URI:http://hcal904daq01.cms904:34152/urn:xdaq-application:lid=0

The test here was done at 904 using a singlepartition local run key.

kakwok commented 6 years ago

It's not the commit that breaks the code, it was the configuration. Hannes made a special RSManager for me earlier which may have fixed the getParentExec() problem in light configuration. In order to test that, I imported from xml /hcalpro/904Int/DAQ_test/crate52_Oct06-17_v2 as a light configuration. I asked Hannes to take a look at the light config, in case he needs to inspect the changes made from his RSManager.

For FM development, I have made another full configuration: /hcalpro/904Int/DAQ_test/crate52_Apr09-18_v2 Sorry for not foreseeing this.

jhakala commented 6 years ago

Ah, okay, I figured out that it was a light configuration (precisely because I'm working on making utilities to navigate the QG and do things like find sibling apps) but I didn't figure out that the destroyAction problem was caused by that. Thanks!