dmwm / CRABServer

15 stars 38 forks source link

adapt to a single username on sched #6905

Closed belforte closed 2 years ago

belforte commented 2 years ago

with transition to htcondor IDTOKENS for submission to sched ( #6903 ) we will have all condor jobs run by a single username crabtw. FYI @mmascher

Each task will still carry a specific user in the task name, in the CRAB_hnusername classad, in the accounting group for fair share, and handle the user X509 proxy just like now.

Still we need a few changes

belforte commented 2 years ago

simply changing WEB_DIR from /home/grid/crabtw to /home/grid/crabtw/<username> in AdjustSites.py works in the schedd. But there is a problem with https://github.com/dmwm/CRABServer/blob/cc8f384595b35b52cd694f7959866ca820f1bded/src/python/CRABInterface/HTCondorDataWorkflow.py#L211 which will return /home/grid/<username>/<taskname> and also client fails to retrieve status cache since it looks for e.g. https://cmsweb.cern.ch:8443/scheddmon/068/belforte/220117_220418:belforte_crab_20220117_230414/status_cache where the crabtw part has got lost (still need to find the code responsible for this).

At first sight a simple solution would be to make /home/grid the home directory of crabtw user But with the new AdjustSites.py also jobs submitted via GSI end up in trying to retrieve status cache from https://cmsweb.cern.ch:8443/scheddmon/0119/belforte/220117_220343:belforte_crab_20220117_230338/status_cache

In a way, there's nothing wrong if a new TW version puts webdire in <username>/<taskname> instead of <poolaccount>/taskname. But new name should be used consistently everywhere.

belforte commented 2 years ago

NOTE I need a solution which works both when using IDTOKENS or GSI.

belforte commented 2 years ago

I am currently having success with the following recipe:

  1. hardcode in AdjustSites.py that WEB_DIR is /home/grid/username/taskname
  2. make /home/grid/ owned by crabtw:zh
  3. make /home/grid writable by everybody

So that:

To make that work I need to change puppet which is currently keeping /home/grid as owned by root https://gitlab.cern.ch/ai/it-puppet-hostgroup-vocmsglidein/-/blob/master/code/manifests/modules/volumes.pp

Now that nobody but CRAB/SI Operators can log on sched machines I do not see security risks in having /home/grid world writable for a while. Changing all code to deal with /home/grid/crabtw as home directory would be a lot of work and may require changing CRAB REST as well.

As usual need to make sure that running tasks are not affected. I think they are not. But need to test.

I need to be more explicit here: job submission directory (cpondor spool dir) location is not changed and still follows condor "rules" and is like /data/srv/glidecondor/condor_local/spool/4207/0/cluster834207.proc0.subproc0 and stays in the "High I/O" CEPH volume. What I am changing is the service directory in the other ("standard I/O") CEPH volume where we put files more likely to be accessed via htttp by the user/client and possibly larger (log files) so that if that volume has problems (full or overload) condor still works.

In the meanwhile I go ahead with testing.

belforte commented 2 years ago

above solution tested finely on vocms068/69 (which had running jobs) and vocms069.

Will test changing permission in one production CRAB sched in global pool next. And if OK apply the new owner/permission to all CRAB scheds

belforte commented 2 years ago

changed onwership and protection for /home/grid also on vocms0106

belforte commented 2 years ago

I found a problem which somehow I did not notice when testing on other sched's, do not know why. Both submissions via GSI and IDTOKEN will try create WEB_DIR in /home/grid/belforte/taskname (that's the new code in AdfjustSites.txt) but as two different users, cms1627 or crabtw. So only the first one will be able to do it, since the user which comes second will find and already existing /home/grid/belforte owned by a different user, and will fail.

2022-01-20 14:36:50.408629: Failed to copy/symlink files in the user web directory: [Errno 13] Permission denied: '/home/grid/belforte/220120_143614:belforte_crab_20220120_153546'

Without the WEB_DIR task still runs, but crab status fails.

Of course we can make /home/grid/username group writable. I do not particularly like it, but see no obvious solution. The alternative is to detect in AdjustSites.py if taks was sent via TOKEN or GSI and do different things. Which may be more clean.

belforte commented 2 years ago

From Saqib ( https://mattermost.web.cern.ch/cms-o-and-c/pl/rjmt79dhrjrktb36iygirmox1r ) jobs submitted via IDTOKENS have these classAds set:

AuthTokenId = "efbaf1f433fa8a13e182a4a172c8ffcd"
AuthTokenIssuer = "cmsgwms-itb.cern.ch"
AuthTokenSubject = "crabtw@cms"

which are not present (undefined) when submitted via GSI.

So I am changing strategy from https://github.com/dmwm/CRABServer/issues/6905#issuecomment-1015845384 to:

This also mean that I can avoid to make /home/grid world-writable, only need to change ownership from root to crabtw

belforte commented 2 years ago

needed code change is now in tag https://github.com/dmwm/CRABServer/releases/tag/py3.220120.1 will move last open subtasks above to ad-hoc issues and close