Open belforte opened 2 years ago
note to myself: publication for a task is controlled in the schedd via the classAd CRAB_Publish which is set in DagmanCreator based on tm_publication value. DBS status of input dataset is checked in DBSDataDiscovery. One easy way could be to override the value of tm_publication in DB inside DBSDataDiscovery, need to check if we have an API for that. Drawback: db info will not match what's in crab config which may be puzzling for future debuggers. Less appealing is to check dataset type again in DagmanCreator, since DBS queries do not belong there.
Maybe it is enough to override the task
object content in DBSDataDiscovery w/o touching the DB ? Maybe changing DB value would be irrelevant anyhow ?
TO BE TESTED
time to fix this, since it now happens and annoy us in production server, see https://mattermost.web.cern.ch/cms-o-and-c/pl/98zp9hw893rtuceb731f8ins5e
rising priority
there is no API to override the value of tm_publication in DB in https://github.com/dmwm/CRABServer/blob/master/src/python/CRABInterface/RESTTask.py . Let's try first for a solution which does not require to deploy a new REST server
the trick of overwriting in-memory task object with
kwargs['task']['tm_publication']='F'
in here https://github.com/dmwm/CRABServer/blob/8c51e4a5de68531591e686ddeb47b5ab0fe33325/src/python/TaskWorker/Actions/DBSDataDiscovery.py#L33-L43 works. But the fact that DB flag still is "publication on" leads to confusing crab status output and overall things will look inconsistent. I could add a Warning. But a cleaner solution would be better. Let's investigate adding an API to change the flag. It is a bit more work but should be straightforward.
@mapellidario @novicecpp in the spirit of what was said in last meeting about "hand over to you issues which you can deal with and do not require extensive knowledge", is this additional API something one of you feels like doing ? Adding a new API is a bit tedious, but should be possible to go by slightly modifying an existing one which modifies some other column. If there's interest I can walk you through the steps.
hmm.... now those tasks which try impossible publications are not "harmful" anymore, they simply fail publication w/o reporting reason to user. We can lower priority
SHort term quick solution:
allowNonValidInputDataset = True
? and print a warning ? Or better: report invalid configuration and force user to change that or publish flag.
users can (partially) process datasets which are still in production (!?) via
see e.g. https://cmsweb.cern.ch:8443/scheddmon/0197/rkansal/220705_155318:rkansal_crab_pfnano_v2_3_2017_HZJ_HToWW_M-125/debug/crabConfig.py which triggered the mail exchange [1] with Alan, Yuyi and Valentin
But it makes not sense to try to publish output in DBS since parentage info is not available for PRODUCTION dataset and things will end up with an endless error loop inside Publisher
[1]