dmwm / CRAB2

CRAB2
2 stars 11 forks source link

add DESIRED_ARCH in JDL #962

Closed belforte closed 10 years ago

belforte commented 10 years ago

to be what can be used to select cluster, i.e. hardware. sl6_x86 vs. sl6_arm e.g. not the gcc version which is due to CMSSW need to put something that can be matched to what pilots advertise w/o substring parsing (igor says that otherwise negotiator needs to use too much CPU).

A possible alternative (IIUC what Igor said) is that CMS specific part of pilot detects the arch and publishes the class ad, but it will casue factories to send pilots to wrong architecture CE's as well.

Need wider discussion with Ops and WMA

belforte commented 10 years ago

that is the output of cmsos should be easy for glidein to publish it. but also for factory to change os=sl6 to os=slc6_amd64

belforte commented 10 years ago

this is getting a bit different:

Now, that's what the glideins advertise.

To keep with the gldiein naming convention, we would use +DESIRED_OpSys +DESIRED_OpSysMajorVer +DESIRED_Arch

But that assumes we only ever specify a single possibility. We should probably allow for lists instead, right?

e.g. a 32bit binary can run both on "INTEL" and "X86_64". Or is that not an option?

Cheers, Igor

On 03/06/2014 11:56 AM, Stefano Belforte wrote:

that's good. So it sounds we need a map from slc6_amd64 to OpSys = "LINUX" OpSysMajorVer = 6 Arch = "X86_64" and so on.

Not that fun to maintain, but... I guess we do not add an arch a week. What if I give you a list of the used arch's ? and if I want to put those in JDL, I guess it is w/o the initial +, right ?

stefano

On 03/06/2014 08:49 PM, Igor Sfiligoi wrote:

All nodes right now are 64bit x86.

But, if you want to be future proof, Condor advertises Arch = "X86_64"

I think 32-bit x86 is "INTEL"... no clue what ARM would be, but could find out.

Igor

On 03/06/2014 11:46 AM, Stefano Belforte wrote:

this does not say e.g. if 32 or 64 bits that's why we have SLC6_amd64 and not SLC6 or imagine we run Linux on ARM... stefano

On 03/06/2014 07:21 PM, Igor Sfiligoi wrote:

The OS is already advertised n the glideins; OpSys = "LINUX" OpSysMajorVer = 6 LSB_RELEASE = "6.4"

We could easily match on that, if sl5 vs sl6 is the only thing you are worried about.

Let me know if that's enough... so we can decide on the policy in the FE, and the related attribute in the jobs.

Cheers, Igor

On 03/06/2014 09:26 AM, Stefano Belforte wrote:

I see that so far we advertise: DESIRED_CMSVersion DESIRED_CMSVersionNr DESIRED_CMSScramArch for consistency, shall I call the new classad DESIRED_CMSArch ? or DESIRED_CMSOsArch ? or DESIRED_CMSOs ?

I am open. And if there's a classAd that also other VO's can use I think both them and factory will benefit. We can always remap slc6_amd64 to whatever you like.

But releases which only run SL6 are coming up and we need to do this. Stefano

On 03/06/2014 05:53 PM, Stefano Belforte wrote:

back to this topic. It is the moment for me to fix Crab2 code. I think you should first agree from glideinWms pov, then share with Ops (which will only like it, my guess). I am asking that factories advertise the hardware behind each CE instead of the current os=sl5/sl6.

On 08/30/2013 12:39 AM, Igor Sfiligoi wrote:

Looking at the glideins, looks like we do not advertise the Arch right now; all we do advertise are the CMSSW versions.

BTW: How would the glidein find out what is the right scram arch? SL5 versions may be installed on SL6 nodes (via shared FS), so just looking for what is available is not an option.

Is there are scram command that tells you what is likely to run? (I am not a CMS SW expert)

The command is: /cvmfs/cms.cern.ch/common/cmsos it returns strings like: slc6_amd64 slc5_ia32 osx104_ia32 etc.

expectation from any site with a bit of sanity is that they have consistent hardware behind each CE. Do not mix window/OSX/Linux/Arm...

this will then need to be consistently used by FE to ask for pilots on CE's of the proper hardware.

In view of this I am adding to JDL: +DESIRED_ARCH=slc6_amd64 (or whatever is correct for each job) https://github.com/dmwm/CRAB2/issues/962

thanks Stefano

belforte commented 10 years ago

summary: So, for the lists, we go with +DESIRED_OpSyses +DESIRED_OpSysMajorVers +DESIRED_Archs currently known Archs are:

Arch = "X86_64" Arch = "INTEL" for 32bit

for the others: OpSys = "LINUX" OpSysMajorVer = 6

belforte commented 10 years ago

and the source of the mapping is: $SCRAM_ARCH = slc5_amd64_gcc462 etc. only exising non amd64 currently is slc5_ia32_gcc434 for cmssw 3x long deprecated

belforte commented 10 years ago

see also: https://cdcvs.fnal.gov/redmine/issues/5606 i.e.:

Good point...

We are currently missing this WN->factory channel. But we always said we wanted it, so I will actually make a ticket for it, so it gets done.

Igor

PS: The GLIDEIN_REQUIRED_OS is a validation check that sort of does what we want. It checks that the WNs are what we expect them to be. But it is both very clunky to maintain and not very flexible. i.e. we want something better

On 03/06/2014 01:18 PM, Stefano Belforte wrote:

just had a thouhg.. but are those defined run time when glidein runs ? or in factory config ? If it is run time, as discussed in the past it will still led to pilots being requested on the wrong hardware based only on site name. IN other works, factory config. now has GLIDEIN_REQUIRED_OS rhel6 what is it used for ?

belforte commented 10 years ago

In the end the big questions is: CMS sl5-32bit (e.g.) exe's run also on sl6 and/or 64bit (provided the installation includes compatibility lis etc.). WHERE SHOULD WE KEEP THIS KNWOLEDGE ?

  1. So far (PeterElmer way): site announces in BDII the supported arch's.
  2. Now (Condor) the side says what op.sys. they have and Crab makes a list of arch's where my executable can run

I am implementing 2. But need to review with Ops at large.

belforte commented 10 years ago

in 2_10_4_pre6