DUNE-DAQ / drunc

Dune RUN Control (DRUNC) is the run control for the DUNE experiment
1 stars 0 forks source link

Problem observed in ru-controller when running minimum_system_quick_test #268

Open bieryAtFnal opened 5 days ago

bieryAtFnal commented 5 days ago
           INFO     core.py:182     FSM:    Post transition:                                                                                          
           INFO     controller.py:142       Controller:     'ru-det-conn-0@131.225.193.20:5501' (type ControlType.REST_API)                           
           INFO     rest_api_child.py:509   ru-det-conn-0-rest-api-child:   Ignoring command 'take_control' sent to 'ru-det-conn-0'                   
           INFO     broadcast_sender.py:65  Broadcast:      ready                                                                                     
           INFO     controller.py:57        controller_cli: 'ru-controller' was started on '5500'                                                     
           INFO     controller.py:280       Controller:     Registering ru-controller to the connectivity service at grpc://131.225.193.20:5500       
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /home/nfs/biery/dunedaq/16OctFDDevPostMergeTest/.venv/lib/python3.10/site-pa │
│ ckages/drunc/broadcast/server/decorators.py:30 in wrap                       │
│                                                                              │
│    27 │   │                                                                  │
│    28 │   │   try:                                                           │
│    29 │   │   │   log.debug('Executing wrapped function')                    │
│ ❱  30 │   │   │   ret = cmd(obj, request) # we strip the context here, no ne │
│    31 │   │   except Exception as e:                                         │
│    32 │   │   │   stack = traceback.format_exc().split("\n")                 │
│    33                                                                        │
│                                                                              │
│ /home/nfs/biery/dunedaq/16OctFDDevPostMergeTest/.venv/lib/python3.10/site-pa │
│ ckages/drunc/authoriser/decorators.py:34 in check_token                      │
│                                                                              │
│   31 │   │   │   │   #     drunc_system = obj.name,                          │
│   32 │   │   │   │   # )                                                     │
│   33 │   │   │   log.debug('Executing wrapped function')                     │
│ ❱ 34 │   │   │   ret = cmd(obj, request)                                     │
│   35 │   │   │   log.debug('Exiting')                                        │
│   36 │   │   │   return ret                                                  │
│   37 │   │   return check_token                                              │
│                                                                              │
│ /home/nfs/biery/dunedaq/16OctFDDevPostMergeTest/.venv/lib/python3.10/site-pa │
│ ckages/drunc/controller/decorators.py:11 in wrap                             │
│                                                                              │
│    8 │   │   if not obj.actor.token_is_current_actor(request.token):         │
│    9 │   │   │   from druncschema.request_response_pb2 import Response, Resp │
│   10 │   │   │   from druncschema.generic_pb2 import PlainText               │
│ ❱ 11 │   │   │   return Response(                                            │
│   12 │   │   │   │   name = obj.name,                                        │
│   13 │   │   │   │   token = request.token,                                  │
│   14 │   │   │   │   data = PlainText(                                       │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: Parameter to MergeFrom() must be instance of same class: expected 
<class 'google.protobuf.any_pb2.Any'> got <class 
'druncschema.generic_pb2.PlainText'>.
[2024-10-16 14:16:41 -0500] [2224161] [INFO] Handling signal: hup
[2024-10-16 14:16:41 -0500] [2224161] [INFO] Hang up: Master
Received 1
Requested termination
[2024-10-16 14:16:41 -0500] [2224161] [WARNING] Worker with pid 2224185 was terminated due to signal 1
[2024-10-16 14:16:41 -0500] [2225657] [INFO] Booting worker with pid: 2225657
[14:16:41] INFO     controller.py:315       Controller:     Unregistering from the connectivity service                                               
           INFO     controller.py:324       Controller:     Stopping children                                                                         
[2024-10-16 14:16:41 -0500] [2224161] [INFO] Handling signal: term
[2024-10-16 14:16:41 -0500] [2225657] [INFO] Worker exiting (pid: 2225657)
[2024-10-16 14:16:41 -0500] [2224161] [INFO] Shutting down: Master
           INFO     flask_manager.py:193    response-listener-flaskmanager-flaskmanager:    response-listener-flaskmanager-flaskmanager terminated    
bieryAtFnal commented 5 days ago

I only see this occasionally, and I haven't been able to reproduce it on np04 computers in the last couple of hours, but it happens fairly reliably when I run daqsystemtest_integtest_bundle.sh -l 0 -N 5 --stop-on-fail on daq.fnal.gov.