Mu2e / otsdaq_mu2e

Mu2e customizations for otsdaq
Other
1 stars 5 forks source link

artdaq dispatcher not cleaning up failed clients ? #63

Open pavel1murat opened 7 months ago

pavel1murat commented 7 months ago

it looks that when a client of an artdaq dispatcher (and likely, any other artdaq executable running in a server mode) fails while connecting and does not disconnect itself cleanly, the dispatcher [server] still keeps the registered client 'unique_label'.

When a client then tries to re-connect, the server rejects it, considering the client already connected.

would be nice to have this fixed.

BTW, does this mean that an artdaq server doesn't monitor the health of its clients ?

mu2etrk@mu2edaq09:~/test_stand/pasha_020>mu2e -c test_dqm_client.fcl 
   ************************** Mu2e Offline **************************
     art v3_14_01    root v6_28_06b    KinKal v02_05_00a
     build  
     build      musebuild file not found
   ******************************************************************
|Info:MF_INIT_OK: [0]   Messagelogger initialization complete.
|Info:MetricManager:MetricManager.cc [31]       MetricManager CONSTRUCTOR
|Info:MetricManager:MetricManager.cc [43]       Configuring metrics with parameter set: 
|Info:MetricManager:MetricManager.cc [503]      Starting Metric Sending Thread
|Info:MetricManager:MetricManager.cc [522]      Metric Sending thread started
|Info:MetricManager:MetricManager.cc [580]      sendMetricLoop_ START
|Info:ArtdaqGlobalsService:ArtdaqGlobalsService_service.cc [55] app_name is art, rank -1
Conditions file: /home/mu2etrk/test_stand/pasha_020/srcs/Offline/ConditionsService/data/conditions_01.txt
Conditions lines: 46  hash: 3575107159191770146
GlobalConstants file: /home/mu2etrk/test_stand/pasha_020/srcs/Offline/GlobalConstantsService/data/globalConstants_01.txt
GlobalConstants lines: 143  hash: 14342993572345108761
|Info:MetricManager:MetricManager.cc [43]       Configuring metrics with parameter set: 
|Info:MetricManager:MetricManager.cc [503]      Starting Metric Sending Thread
|Info:MetricManager:MetricManager.cc [522]      Metric Sending thread started
|Info:MetricManager:MetricManager.cc [580]      sendMetricLoop_ START
|Info:art_xmlrpc_commander:xmlrpc_commander.cc [1335]   XMLRPC COMMANDER CONSTRUCTOR: Port: 0, Server Url: http://localhost:21105/RPC2
|Info:art_TCPSocketTransfer:TCPSocketTransfer.cc [980]  DQMClient01_RECV: Starting Listener Thread
|Info:PortManager:PortManager.cc [285]  Using default port range for TCPSocket Transfer
|Error:art_TCP_listen_fd:TCP_listen_fd.cc [56]  Could not bind socket for port 21600! Exiting with code 3!
bind error: File exists
|Info:TransferWrapper:TransferWrapper.cc [325]  Attempting to register this monitor ("DQMClient02") with the dispatcher aggregator
|Info:TransferWrapper:TransferWrapper.cc [330]  Response from dispatcher is "Unable to create a Transfer plugin with the FHiCL code "filter_paths:[] outputs:{dispatcherTransferOutput:{module_type:"TransferOutput" transfer_plugin:{destination_rank:100 host_map:[{host:"localhost" rank:100},{host:"localhost" rank:5},{host:"localhost" rank:4},{host:"localhost" rank:3},{host:"localhost" rank:2},{host:"localhost" rank:1},{host:"localhost" rank:0}] max_fragment_size_words:1.048576e6 source_rank:5 transferPluginType:"TCPSocket" unique_label:"DQMClient01"}} dumpOutput:{module_type:"FileDumperOutput" wantProductFriendlyClassName:true}} path:["dispatcherTransferOutput"] physics:{dispatcher_path:["prescaler"] filters:{prescaler:{module_type:"Prescaler" prescaleFactor:100 prescaleOffset:0}} out:["dispatcherTransferOutput","dumpOutput"]} process_name:"DispatcherArtJob" services:{ArtdaqFragmentNamingServiceInterface:{helper_plugin:"Mu2e" service_provider:"ArtdaqFragmentNamingService"} ArtdaqSharedMemoryServiceInterface:{service_provider:"ArtdaqSharedMemoryService"}} source:{module_type:"ArtdaqInput"} unique_label:"DQMClient02"", a new monitor has not been registered
Exception: ---- DispatcherCore BEGIN
  Unique label already exists!
---- DispatcherCore END
"
|Warning:TransferWrapper:TransferWrapper.cc [338]       Error in TransferWrapper: attempt to register with dispatcher did not result in the "Success" response
Art has completed and will exit with status 0.