Closed belforte closed 2 years ago
simply changing WEB_DIR from /home/grid/crabtw
to /home/grid/crabtw/<username>
in AdjustSites.py works in the schedd. But there is a problem with
https://github.com/dmwm/CRABServer/blob/cc8f384595b35b52cd694f7959866ca820f1bded/src/python/CRABInterface/HTCondorDataWorkflow.py#L211
which will return /home/grid/<username>/<taskname>
and also client fails to retrieve status cache since it looks for e.g. https://cmsweb.cern.ch:8443/scheddmon/068/belforte/220117_220418:belforte_crab_20220117_230414/status_cache where the crabtw
part has got lost (still need to find the code responsible for this).
At first sight a simple solution would be to make /home/grid
the home directory of crabtw
user
But with the new AdjustSites.py also jobs submitted via GSI end up in trying to retrieve status cache from https://cmsweb.cern.ch:8443/scheddmon/0119/belforte/220117_220343:belforte_crab_20220117_230338/status_cache
In a way, there's nothing wrong if a new TW version puts webdire in <username>/<taskname>
instead of <poolaccount>/taskname
. But new name should be used consistently everywhere.
NOTE I need a solution which works both when using IDTOKENS or GSI.
I am currently having success with the following recipe:
/home/grid/username/taskname
/home/grid/
owned by crabtw:zh
/home/grid
writable by everybodySo that:
/home/grid/username/
for any usernamecmsxxx
who can still create /home/grid/username/
thanks to 3. above.To make that work I need to change puppet which is currently keeping /home/grid
as owned by root
https://gitlab.cern.ch/ai/it-puppet-hostgroup-vocmsglidein/-/blob/master/code/manifests/modules/volumes.pp
Now that nobody but CRAB/SI Operators can log on sched machines I do not see security risks in having /home/grid
world writable for a while. Changing all code to deal with /home/grid/crabtw
as home directory would be a lot of work and may require changing CRAB REST as well.
As usual need to make sure that running tasks are not affected. I think they are not. But need to test.
I need to be more explicit here: job submission directory (cpondor spool dir) location is not changed and still follows condor "rules" and is like /data/srv/glidecondor/condor_local/spool/4207/0/cluster834207.proc0.subproc0
and stays in the "High I/O" CEPH volume. What I am changing is the service directory in the other ("standard I/O") CEPH volume where we put files more likely to be accessed via htttp by the user/client and possibly larger (log files) so that if that volume has problems (full or overload) condor still works.
In the meanwhile I go ahead with testing.
above solution tested finely on vocms068/69 (which had running jobs) and vocms069.
Will test changing permission in one production CRAB sched in global pool next. And if OK apply the new owner/permission to all CRAB scheds
changed onwership and protection for /home/grid also on vocms0106
I found a problem which somehow I did not notice when testing on other sched's, do not know why.
Both submissions via GSI and IDTOKEN will try create WEB_DIR in /home/grid/belforte/taskname
(that's the new code in AdfjustSites.txt) but as two different users, cms1627
or crabtw
. So only the first one will be able to do it, since the user which comes second will find and already existing /home/grid/belforte
owned by a different user, and will fail.
2022-01-20 14:36:50.408629: Failed to copy/symlink files in the user web directory: [Errno 13] Permission denied: '/home/grid/belforte/220120_143614:belforte_crab_20220120_153546'
Without the WEB_DIR task still runs, but crab status fails.
Of course we can make /home/grid/username
group writable. I do not particularly like it, but see no obvious solution. The alternative is to detect in AdjustSites.py if taks was sent via TOKEN or GSI and do different things. Which may be more clean.
From Saqib ( https://mattermost.web.cern.ch/cms-o-and-c/pl/rjmt79dhrjrktb36iygirmox1r ) jobs submitted via IDTOKENS have these classAds set:
AuthTokenId = "efbaf1f433fa8a13e182a4a172c8ffcd"
AuthTokenIssuer = "cmsgwms-itb.cern.ch"
AuthTokenSubject = "crabtw@cms"
which are not present (undefined) when submitted via GSI.
So I am changing strategy from https://github.com/dmwm/CRABServer/issues/6905#issuecomment-1015845384 to:
/home/grid/cmsxxxx
)/home/grid/<username>
This also mean that I can avoid to make /home/grid
world-writable, only need to change ownership from root
to crabtw
needed code change is now in tag https://github.com/dmwm/CRABServer/releases/tag/py3.220120.1 will move last open subtasks above to ad-hoc issues and close
with transition to htcondor IDTOKENS for submission to sched ( #6903 ) we will have all condor jobs run by a single username
crabtw
. FYI @mmascherEach task will still carry a specific user in the task name, in the CRAB_hnusername classad, in the accounting group for fair share, and handle the user X509 proxy just like now.
Still we need a few changes
crabtw
local username to all scheds (a SI task, @saqibhaleem will take care) - done see https://cms-logbook.cern.ch/elog/GlideInWMS/7866/home/grid/crabtw/username/
or/home/dir/username
need to run some tests, locate code which needs to change and make sure to address proxyed_webdir construction etc. there may be many places where we build the webdir from the task name. Grafana ? maybe worth a separate issue/home/grid
in puppet to give full acces tocrabtw
- coordinate with @saqibhaleem - see https://github.com/dmwm/CRABServer/issues/6905#issuecomment-1015845384condor_q
output to show username as well - Maybe use custom print format ? See this from Brian: https://indico.cern.ch/event/272794/contributions/614943/attachments/490434/677959/scripting_condor.pdf - moved to #6985