Closed belforte closed 1 year ago
In the end the explanation of all the issues I had with environment is a (big) initial misunderstanding.
I thought that the job payload is executed inside Scram()
in a clean environment, just since it was done that way (but in the COMP) environment in the past.
Instead as @khurtado explained
cmsRun is executed through a subprocess call (+ a wrapper for the CMSSW setup) and not with Scram() , those variables are not lost (which is why the WM PYTHONPATH is removed by hand before executing it, but then put back for other stuff that needs it afterwards in the executor)
Also, quoting from https://github.com/dmwm/WMCore/wiki/Notes-about-environment-variables-passed-to-the-Scram-environment-or-modified-when-running-the-CMSSW-executable , :
The CMSSW executable (cmsRun) is not executed directly by the executor with Scram(). Instead, it runs a wrapper script that setups its own environment (with a name stepName-main.sh
. E.g.: cmsRun1-main.sh). This bash script is written on the fly and defined HERE.
Since this wrapper script is setting up its own scram environment, the script itself is called by simply using a subprocess call. However, some environment variables need to be overridden in order to avoid problems between the CMSSW environment and the WM environment.
The environment override is defined HERE and basically make sure that:
- The WM PYTHONPATH is not passed in the subprocess call.
_- It sets XRD_LOADBALANCERTTL
to workaround a problem at CERN related to the GSI authentication plugin and EOS with XRootD
- It sets the HOME
environment variable_
After all these changes in the environment, the CMSSW executable is invoked through this wrapper script HERE However, since we clean up the WM PYTHONPATH in the os system, other steps (e.g.: in the stepChain workflow) would fail after this if they can't find the WM libraries, so the original WM PYTHONPATH is put back in the environment after calling the cmssw executable/cmsRun wrapper.
So this is significantly different from current CRAB approach, and we need to decide if to change CRAB Job Wrapper to follow more closely what WMCore does and how much closely since we do not want to reproduce all WMA Step machinery.
Maybe as simple as use Scram(envCmd=...)
( from HERE ) to cleanup $PYTHONPATH and then run Scram()
in the job start environment ?
@amaltaro @khurtado @mapellidario @dciangot your input is more than welcome !
By the way, currently CRAB does nothing about
set XRD_LOADBALANCERTTL to workaround a problem at CERN related to the GSI authentication plugin and EOS with XRootD
how much worried should we be ? Does anybody know what this thing is ? Is there any document/ticket/issue/elog about it ?
@belforte It looks like XRD_LOADBALANCERTTL
and HOME
were added to fix these 2 issues from 2015/2016, both for Tier0 jobs running at CERN:
XRD_LOADBALANCERTTL: https://hypernews.cern.ch/HyperNews/CMS/get/edmFramework/3572.html https://github.com/dmwm/WMCore/pull/6325
HOME: https://hypernews.cern.ch/HyperNews/CMS/get/edmFramework/3654.html https://github.com/dmwm/WMCore/issues/6894 https://github.com/dmwm/WMCore/pull/6325
Whether those are still a problem at CERN or not, I honestly don't know though.
thanks. Wow, that's ancient stuff (2015 !). Given that the original thread was hinting at xroot client v.4.2 being a possible solutionc wrt 4.0.4 m and that we now run v4.5.0 now, I am not going to worry.
$HOME is a different story, it is still needed.
this is also the time to find a definitive solution to https://github.com/dmwm/WMCore/issues/10257 Basically to put on firm ground the CRAB vs. WMCore JobWrapper. Is there some common environment and code that we can share ?
Alan asked for a google doc as a start, but I found it easier to start with a GH wiki which can hopefully be turned into a bit of permanent documentation for CRAB developers. https://github.com/dmwm/CRABServer/wiki/RunTime-CRAB-vs-WMCore
Anyhow I also copied the markdown text here https://docs.google.com/document/d/13IIxPGbQS3a3k0Vl0j3o9ivJSO0ZoMtg5D8XN8xXSFE/edit?usp=sharing
@mapellidario please review and let's make sure that it makes sense from our side first, then we will ask Alan to have a chat about it
As I progress with the review of our jobwrapper, comparing it with the current WMCore one, I will add here a list of action items:
startup_environment.sh
? or simply dump it to job output via a scritpExe job in case of need ?I am not sure that multiple architectures makes sense for CRAB unless we drastically change other things.
almost time to raise priority to critical: https://cms-talk.web.cern.ch/t/crab-test-cmssw-12-6-x-invalid-site-local-config/15423/1
I addressed this issue in the PRs:
Al these are included in the latest CRABServer tag https://github.com/dmwm/CRABServer/releases/tag/v3.230220 and are running in production since wednesday morning.
I consider this issue as completed and move further discussion about the jobwrapper to new issues. If anybody does not agree, feel free to re-open this issue!
see https://github.com/dmwm/WMCore/issues/10970#issuecomment-1039742503 and especially https://github.com/dmwm/WMCore/wiki/Notes-about-environment-variables-passed-to-the-Scram-environment-or-modified-when-running-the-CMSSW-executable