cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.33k forks source link

Remote reading of pLHE fallback issue for production #34061

Open scarletnorberg opened 3 years ago

scarletnorberg commented 3 years ago

CMSSW needs to be fixed still to do fallback file open for LHE the same way it does for EDM inputs for core sw.

https://github.com/dmwm/WMCore/pull/9154#issuecomment-487938039 this is where WMCore has added

This is what is what is already done in CMSSW: https://github.com/cms-sw/cmssw/issues/31161 https://github.com/cms-sw/cmsdist/pull/6283

cmsbuild commented 3 years ago

A new Issue was created by @scarletnorberg .

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

kpedro88 commented 3 years ago

assign core, generators attn: @bbockelm

cmsbuild commented 3 years ago

New categories assigned: core,generators

@Dr15Jones,@smuzaffar,@mkirsano,@SiewYan,@alberto-sanchez,@makortel,@agrohsje,@GurpreetSinghChahal you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 3 years ago

What is the urgency for this feature?

I'd also like to understand better why whatever we have been doing so far doesn't work (that well) anymore.

makortel commented 3 years ago

What fallback are we actually talking about? The fallback catalogs defined in site-local-config.xml or the fallback in AAA?

makortel commented 3 years ago

Or is the request really about wantong to give LFN(s) to the LHESource and to rely on xrootd/AAA to find the site providing the file?

dan131riley commented 3 years ago

It looks like the LHESource isn't using the file catalog at all, so they must be using PFNs. The first step would be to use the file catalog for LFN to PFN translation, and then if AAA fallback is wanted see https://github.com/cms-sw/cmssw/pull/28064 for an example implementation.

dan131riley commented 3 years ago

Scratch the previous comment, I forgot that the fallback catalog had been replaced by multiple catalogs in https://github.com/cms-sw/cmssw/pull/28911 so mostly what's needed is to use the file catalog.

makortel commented 3 years ago

With @Dr15Jones we came to the same conclusion.

The remaining questions are if we have understood the request correctly, and what is the urgency of the request.

scarletnorberg commented 3 years ago

This is for production because of the way campaigns are setup we have to blacklist certain sites one is CERN and it causes issues because the pLHE can only run at CERN. This seems to be going up and causing a lot of manual operation so if we could fix it sooner we would be appreciate it.

The reason CERN is blacklisted is because the pileup is not hosted at CERN and the HLT resources need the pileup to be hosted there to run.

We want to run pLHE requests at places other than CERN this is what is needed.

@haozturk, @nsmith-