fermi-ad / controls

Central repo for reporting bugs, making feature requests, managing RFCs, and requesting seminar topics.
https://www-bd.fnal.gov/controls/
2 stars 0 forks source link

CLX40E not returning data for MI LCW parameters. #56

Closed kengell closed 2 months ago

kengell commented 4 months ago

Phone call from MCR reporting that the ACNET device J:CLX40E is in alarm (digital) and that MI LCW parameters are not updating.

kengell commented 4 months ago

Found the following repeated in the ERLANG log.

=INFO REPORT==== 27-Feb-2024::14:11:10.014336 ===
Appserver activated dpmclient for {array,10,10,"undefined0123456789012",
                                       {"i:524rs@p,1000","i:525rs@p,1000",
                                        "i:524rr@p,1000","i:525rr@p,1000",[],
                                        [],[],[],[],[]}}.

And also repeated entries like:

=INFO REPORT==== 27-Feb-2024::14:11:12.001962 ===
appserver copyset exceeded limit 2 for no-data received executions, restarting dpmclient.

=WARNING REPORT==== 27-Feb-2024::14:11:12.003102 ===
readset:readingLoop/0 : unexpected message -- {acnet_cancel,
                                               {<0.677.0>,
                                                #Ref<0.1322998952.1681915907.258095>,
                                                acnet}}.
kengell commented 4 months ago

Performed an

acnet restart run_erl_fe

to clear up repeated errors in the log and return data for MI LCW parameters.

awattsFNAL commented 4 months ago

There were several more occurrences of CLX40 coming into alarm and the 62 LCW parameters (E:62WV01, E:62WV02, and E:62WLVL) reading NaN during this shift.

kengell commented 4 months ago

Screen capture from the Datalogger around the 3AM reboot time of 03:02:09 (according to Erlang logs on CLX40E).

I show we have read backs for E:62WL01 and E:62WL02 prior to the reboot.

The ACNET devices of E:62WL01/02 are used by the MONITOR devices of E:62WV01, E:62WV02 and E:62WLVL.

Prior to the reboot, we have data on CLX40E.

Image

kengell commented 4 months ago

Found entries like below in the Erlang log files on CLX40E. I will need to build new plcdirect beam files and stage them on the FE to see if we can't get a better idea of what the FE does not like about these messages.

=WARNING REPORT==== 8-Mar-2024::00:19:21.918206 ===
"mi62-lcw-plc" plcdirect unexpected message {udp,#Port<0.99>,
                                             {131,225,124,103},
                                             28784,
                                             <<72,65,80,59,0,160,59,68,0,34,
                                               25,0,0,0,0,10,0,0,0,3,0,0,0,0,
                                               0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                               0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                               0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
                                               0,0,0,0,0,0,0,0>>} 

The function producing the log entries is found in plcdirect.erl:

233 handle_info(Any, #tstate{server = Server} = S) ->
234     warning_msg("~p plcdirect unexpected message ~p ~n",
235         [Server, Any]),
236     {noreply, S}.
kengell commented 3 months ago

Received email from CLX40E on Saturday @ 0246. GETS32 disconnected from ACNET.

Image

kengell commented 3 months ago

Output the diagnostic acl script PRIOR to reboot on Saturday.

acl node /tasks clx40e Node CLX40E connected tasks: 0 ACNET 5724 (RUM) 1 DBNEWS 4122188 (RUM) 2 STATES 4122188 (RUM) 3 SYNC 4122188 (RUM) 4 SLAM 4122188 (RUM) 5 %56406 4122188 6 %32793 4122188 7 WRITER 4122188 8 RETDAT 4122188 (RUM) 9 SETDAT 4122188 (RUM) 10 GETS32 4122188 (RUM) 11 ALARMR 4122188 (RUM) 12 SETS32 4122188 (RUM) 13 SETSVR 4122188 14 ACSYS 4122188 (RUM) 15 FTPMAN 4122188 (RUM) 16 FECONF 4122188 (RUM) 17 tasks were found

kengell commented 3 months ago

I performed an acnet restart all on clx40e at 1630 on Monday 15 April. Overnight there were no problems w/ the FE.