glideinWMS / glideinwms

The glideinWMS Project
http://tinyurl.com/glideinwms
Apache License 2.0
16 stars 46 forks source link

Factory crashing on update to 3.10.6 because submit_attrs.cfg is not created when there are no submit attributes #388

Closed mmascher closed 10 months ago

mmascher commented 10 months ago

The actual error is that submit_attrs.cfg files are not created in the work-dir and entries subdirectories when upgrading an existing Factory and in absence of content to put in the files (no submit_attr entries for that section -- global or entry config). The missing files cause the Factory to crash. See the comment below for a test and a workaround.

ISSUE EDITED. THIS ISSUE WAS: "Entries might be configured with no submit attributes"

Still related to this https://github.com/glideinWMS/glideinwms/pull/382

After applying the fix from #387 I hit another road blocker. It turns out that not all the entries have submit attributes, and when an entry does not have any submit attribute a submit_attrs.cfg file is not created. In order to conrtinue with the tests in ITB what I did was to run this command:

[mmascher@vocms0205 work-dir]$ pwd
/var/lib/gwms-factory/work-dir
[mmascher@vocms0205 work-dir]$ for dir in `find . -maxdepth 1 -type d`; do sudo -u gfactory touch $dir/submit_attrs.cfg; done

The exception I was hitting was:

[2024-01-16 16:11:48,506] WARNING: glideFactory:413: EntryGroup 0 STDERR: b'Traceback (most recent call last):\n  File "/usr/lib/python3.6/site-packages/glideinwms/factory/glideFactoryEntryGroup.py", line 743, in <module>\n    main(int(sys.argv[1]), int(sys.argv[2]), int(sys.argv[3]), sys.argv[4], sys.argv[5], sys.argv[6])\n  File "/usr/lib/python3.6/site-packages/glideinwms/factory/glideFactoryEntryGroup.py", line 687, in main\n    my_entries[entry] = glideFactoryEntry.Entry(entry, startup_dir, glideinDescript, frontendDescript)\n  File "/usr/lib/python3.6/site-packages/glideinwms/factory/glideFactoryEntry.py", line 57, in __init__\n    self.jobSubmitAttrs = glideFactoryConfig.JobSubmitAttrs(name)\n  File "/usr/lib/python3.6/site-packages/glideinwms/factory/glideFactoryConfig.py", line 300, in __init__\n    lambda s: s,\n  File "/usr/lib/python3.6/site-packages/glideinwms/factory/glideFactoryConfig.py", line 102, in __init__\n    ConfigFile.__init__(self, os.path.join("entry_" + entry_name, config_file), convert_function)\n  File "/usr/lib/python3.6/site-packages/glideinwms/factory/glideFactoryConfig.py", line 58, in __init__\n    self.load(config_file, convert_function)\n  File "/usr/lib/python3.6/site-packages/glideinwms/factory/glideFactoryConfig.py", line 62, in load\n    with open(fname) as fd:\nFileNotFoundError: [Errno 2] No such file or directory: \'entry_CMSHTPC_T2_PK_NCP_htcondor-ce-2/submit_attrs.cfg\'\n'

Notice that compared to the exception in #386 here we have entry_CMSHTPC_T2_PK_NCP_htcondor-ce-2, meaning the fix for the wrong location has been applied and it's another type of issue.

mambelli commented 10 months ago

The actual problem is that submit_attrs.cfg files are not created in the work-dir and entries subdirectories when upgrading an existing factory and in absence of content to put in the files (submit_attr entries for that section -- global or entry config).

A workaround is to create all the submit_attrs.cfg files after the yum upgrade, e.g:

pushd /var/lib/gwms-factory/work-dir
touch submit_attrs.cfg
chown gfactory: submit_attrs.cfg
for i in entry_*; do touch "$i"/submit_attrs.cfg; chown gfactory: "$i"/submit_attrs.cfg; done
popd

Here is the test on an Alma9 Factory, in GWMS workspaces container, fresh install (deleting work-dir works as well but is not feasible for an upgrade):

...Updated the glidein_startup.sh and local_start.sh scripts
...Updated the glidein_startup.sh file in the staging area
...Updated the factory_startup script
...Reconfigured glidein 'gfactory_instance' is complete
...Active entries are:
     ce-workspace.glideinwms.org
...Verifying rrd schema
...Submit files are in /var/lib/gwms-factory/work-dir
Upgrading the factory                                      [  OK  ]
[root@factory-workspace /]# ls -al /var/lib/gwms-factory/work-dir/
total 280
drwxr-xr-x 1 gfactory gfactory  4096 Jan 18 00:15 .
drwxr-xr-x 1 root     root       147 Jan 11 04:39 ..
-rw-r--r-- 1 gfactory gfactory   932 Jan 18 00:15 aggregated_stats_dict.data
-rw-r--r-- 1 gfactory gfactory   105 Jan 18 00:15 attributes.cfg
-rw-r--r-- 1 gfactory gfactory 18060 Jan 11 04:39 checksum.factory
drwxr-xr-x 2 gfactory gfactory    54 Jan 18 00:15 client_log
drwxr-xr-x 2 gfactory gfactory    54 Jan 18 00:15 client_proxies
-rw-r--r-- 1 gfactory gfactory    50 Jan 18 00:15 cvmfsexec.cfg
drwxr-xr-x 3 gfactory gfactory   178 Jan 18 00:15 entry_TEST_ENTRY
drwxr-xr-x 3 gfactory gfactory   178 Jan 18 00:15 entry_ce-workspace.glideinwms.org
drwxr-xr-x 3 gfactory gfactory   178 Jan 18 00:15 entry_el7ce-workspace.glideinwms.org
-rwxr-xr-x 1 gfactory gfactory 19728 Jan 18 00:15 factory_startup
-rw-r--r-- 1 gfactory gfactory   286 Jan 18 00:15 frontend.descript
-rw-r--r-- 1 gfactory gfactory     0 Jan 18 00:15 gfi_advertize.lock
-rw-r--r-- 1 gfactory gfactory     0 Jan 18 00:15 gfi_status.lock
-rw-r--r-- 1 gfactory gfactory  1907 Jan 18 00:15 glidein.descript
-rwxr-xr-x 1 gfactory gfactory 88126 Jan 18 00:15 glidein_startup.sh
-rwxr-xr-x 1 gfactory gfactory  1979 Jan 11 04:39 local_start.sh
drwxr-xr-x 1 gfactory gfactory    75 Jan 18 00:15 lock
lrwxrwxrwx 1 root     root        21 Oct  6 15:39 log -> /var/log/gwms-factory
lrwxrwxrwx 1 root     root        38 Oct  6 15:39 monitor -> /var/lib/gwms-factory/web-area/monitor
-rw-r--r-- 1 gfactory gfactory   148 Jan 18 00:15 params.cfg
-rw------- 1 gfactory gfactory  1766 Jan 18 00:15 rsa.key
-rw-r--r-- 1 gfactory gfactory   383 Jan 18 00:15 signatures.sha1
-rw-r--r-- 1 gfactory gfactory    27 Jan 18 00:15 submit_attrs.cfg
-rwxr-xr-x 1 gfactory gfactory  4046 Jan 11 04:39 update_proxy.py
[root@factory-workspace /]# ls -al /var/lib/gwms-factory/work-dir/entry_ce-workspace.glideinwms.org/
total 32
drwxr-xr-x 3 gfactory gfactory  178 Jan 18 00:15 .
drwxr-xr-x 1 gfactory gfactory 4096 Jan 18 00:15 ..
-rw-r--r-- 1 gfactory gfactory  365 Jan 18 00:15 attributes.cfg
-rw-r--r-- 1 gfactory gfactory  155 Jan 18 00:15 infosys.descript
-rw-r--r-- 1 gfactory gfactory 2005 Jan 18 00:15 job.condor
-rw-r--r-- 1 gfactory gfactory  807 Jan 18 00:15 job.descript
drwxr-xr-x 2 gfactory gfactory    6 Jan 18 00:15 lock
lrwxrwxrwx 1 gfactory gfactory   72 Jan 18 00:15 monitor -> /var/lib/gwms-factory/web-area/monitor/entry_ce-workspace.glideinwms.org
-rw-r--r-- 1 gfactory gfactory   92 Jan 18 00:15 monitor.xml
-rw-r--r-- 1 gfactory gfactory   88 Jan 18 00:15 params.cfg
-rw-r--r-- 1 gfactory gfactory   27 Jan 18 00:15 submit_attrs.cfg
[root@factory-workspace /]# cat /var/lib/gwms-factory/work-dir/entry_ce-workspace.glideinwms.org/submit_attrs.cfg
# File: submit_attrs.cfg
#
mambelli commented 10 months ago

The plan is to find an automatic fix for the update. In the meantime, the workaround in the previous comment will allow successful upgrades

mambelli commented 10 months ago

Fixed by #391