chandra-mta / config_mon

This is the MTA dumps monitoring scripts.
1 stars 0 forks source link

ACIS Ops team not recieving alerts #3

Open william-aaron-CFA opened 10 months ago

william-aaron-CFA commented 10 months ago

From bug email:

"No one on the ACIS Ops team received an alert from the M&TA software that processes the dump data when 1CRBT violated its Red high limit on 28 September. We did receive alerts from the ACIS Ops SW that processes the realtime telemetry. As you know, ACIS Ops does not process the dump data to search for violations, we rely on M&TA to do that task. If there is an issue with this processing, we need to know about it and hopefully resolve it. Please investigate why no one on ACIS Ops received an alert from M&TA about the 1CRBT violation on 28 September. "

william-aaron-CFA commented 10 months ago

The dumps Script /data/mta/Script/Dumps/Scripts/run_otg_wrap_script2 populates the Dumps_mons/IN directory with TL files as used by the config_mon/Dumps_mon set of scripts. However checking the cronjob listed error log shows this..

from the filters_otg2.cron error log. ls: cannot access '/data/mta/Script/Dumps//.tl': No such file or directory ls: cannot access '/data/mta/Script/Dumps/TLfiles//': No such file or directory ls: cannot access '/data/mta/Script/Dumps/Dumps_mon/Done//': No such file or directory ls: cannot access '/data/mta/Script/Dumps/Dumps_mon/Done//': No such file or directory ls: cannot access '/data/mta/Script/Dumps/Dumps_mon/Done//': No such file or directory ls: cannot access '/data/mta/Script/Dumps/Dumps_mon/Done//': No such file or directory ls: cannot access '/data/mta/Script/Dumps/Dumps_mon/Done//': No such file or directory ls: cannot access '/data/mta/Script/Dumps/Dumps_mon/Done//': No such file or directory ls: cannot access '/data/mta/Script/Dumps/TLfiles//': No such file or directory ls: cannot access '/data/mta/Script/Dumps/TLfiles//': No such file or directory ls: cannot access '/data/mta/Script/Dumps/TLfiles//': No such file or directory ls: cannot access '/data/mta/Script/Dumps/TLfiles//*': No such file or directory move_tl_files.py done cp: failed to close '/data/mta/DataSeeker/data/repository/deahk_elec.rdb~': No space left on device update_dea_rdb.py done

In a similar fashion, the backup of this dumps script running on /data/mta4 containes the following error. update_dea_rdb.py done move_tl_files.py done cp: failed to close '/data/mta/DataSeeker/data/repository/deahk_temp.rdb~': No space left on device cp: failed to close '/data/mta/DataSeeker/data/repository/deahk_elec.rdb~': No space left on device update_dea_rdb.py done

william-aaron-CFA commented 10 months ago

[waaron@scrapper-16:55:repository]$ pwd /data/mta/DataSeeker/data/repository [waaron@scrapper-16:55:repository]$ lsl | grep deahk -rw-r--r-- 1 mta head 514M Jun 28 2022 deahk_elec.rdb -rw-r--r-- 1 mta head 514M Oct 27 15:06 deahk_elec.rdb~ -rw-r--r-- 1 mta head 554M Jun 28 2022 deahk_temp.rdb -rw-r--r-- 1 mta head 554M Oct 27 15:06 deahk_temp.rdb~ [waaron@scrapper-16:55:repository]$ diff deahk_temp.rdb deahk_temp.rdb~ [waaron@scrapper-16:55:repository]$

As shown above, these rdb file are not begin updated anymore. Will ask syshelp if there is a limit on the allowed max filesize for a user to write a file to as this might interrupt these proceedings.