SmartDataProjects / dynamo

CMS next-generation dynamic data management
MIT License
1 stars 8 forks source link

subscription via /registry/request/copy not as expected #256

Closed vlimant closed 5 years ago

vlimant commented 5 years ago

it looks like passing dataset in that api will not have existing subscriptions in full under dataops be turned to anaops (as was the previous ddm script doing). @dr-stringfellow can you please confirm what is expected of that api ?

vlimant commented 5 years ago

see https://github.com/CMSCompOps/WmAgentScripts/blob/7af051b1e72d04fab86af87b608a885bd3180d2b/utils.py#L635 for how this is called

dr-stringfellow commented 5 years ago

can you give me an example for when it didn't work? I can check what we have in the databases.

vlimant commented 5 years ago

I don't know when ; looks like it did not do it for

/DoublePhotonNoMaterial_FlatPt-0p01To10/RunIIWinter19PFCalibDR-2017ConditionsNoPUExtZeroMaterial_NoMaterial_105X_mc2017_realistic_v5-v1/AODSIM

https://its.cern.ch/jira/browse/CMSTRANSF-65

dr-stringfellow commented 5 years ago

Hard to say. There is no record of it in our databases, but the server was down for a couple hours recently. Probably we've just been unlucky with this one.

vlimant commented 5 years ago

let me take a recent example with

/ChargedHiggsToCS_M155_13TeV-madgraph/RunIISummer16NanoAODv4-PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/NANOAODSIM

requested in full to purdue by the wmagent, under dataops (regular operation)

https://cmsweb.cern.ch/phedex/prod/Request::View?request=1744128

announced to dynamo : https://cms-unified.web.cern.ch/cms-unified/showlog/?search=task_HIG-RunIISummer16NanoAODv4-02766

/ChargedHiggsToCS_M155_13TeV-madgraph/RunIISummer16NanoAODv4-PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/NANOAODSIM 4974/4974 = 100.00%
/ChargedHiggsToCS_M155_13TeV-madgraph/RunIISummer16NanoAODv4-PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/NANOAODSIM is send to dynamo in 1 copies []
workflow outputs are announced

but then there are no additional subscription anywhere

https://cmsweb.cern.ch/phedex/datasvc/xml/prod/requestlist?dataset=/ChargedHiggsToCS_M155_13TeV-madgraph/RunIISummer16NanoAODv4-PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/NANOAODSIM

and the subscription to Purdue is still dataops. Once the tape copy succeeds, it will go away from disk, as it is now unlocked by unified (or soon will be) and residing in dataops space

vlimant commented 5 years ago

@dr-stringfellow you were telling me this behavior is expected, can you please elaborate here ? it looks to me that this is impacting rather negatively the dataops space and likely something has to be done

vlimant commented 5 years ago

further on this, I noticed that http://t3serv001.mit.edu/~cmsprod/IntelROCCS/Detox/result/T1_DE_KIT_Disk/RemainingDatasets.txt is not up to date :

#- DDM Partition: DataOps -
#
#  Rank      Size nsites nsites  DatasetName
#[~days]     [GB] before after
#---------------------------------------------
     3.3  172665.6     62     62  /MinBias_TuneCP5_14TeV-pythia8/PhaseIIMTDTDRAutumn18DR-NoPU_103X_upgrade2023_realistic_v2-v1/FEVT

while

https://cmsweb.cern.ch/phedex/datasvc/xml/prod/subscriptions?dataset=/MinBias_TuneCP5_14TeV-pythia8/PhaseIIMTDTDRAutumn18DR-NoPU_103X_upgrade2023_realistic_v2-v1/FEVT

<subscription priority="low" time_start="" move="n" suspend_until="" node="T1_DE_KIT_Disk" percent_bytes="100" time_create="1549079947.78276" node_files="17229" time_update="1556572515.75751" group="AnalysisOps" node_id="1821" request="1614576" node_bytes="178959788698034" level="DATASET" custodial="n" suspended="n" percent_files="100"/>

it is supposed to be anaops

dr-stringfellow commented 5 years ago

Do you know when the group was changed?

vlimant commented 5 years ago

some days ago I think

dr-stringfellow commented 5 years ago

We run bi-weekly full updates fetching everything from phedex. We find out about group changes then. In the hourly delta updates we cannot register it, since the replica that had its group changed will not appear if you query phedex with "update_since". Only size changes etc will show up, no group changes. It's been pointed out a couple times already to phedex developers in the past, but we've never been insistent enough to have the group change also change the last_update timestamp.

vlimant commented 5 years ago

when was the last full update ? the dataset actually appears as both anaops and dataops in http://t3serv001.mit.edu/~cmsprod/IntelROCCS/Detox/result/T1_DE_KIT_Disk/RemainingDatasets.txt which I find a little odd

dr-stringfellow commented 5 years ago

Last update was a week ago. I would not rely on these IntelROCCS files anymore. Looking into our inventory, I see it only in DataOps. By the way, the txt file lists "Physics" (AnalysisOp+DataOps) and "DataOps". You cannot conclude that it's AnalysisOps when it's appearing in Physics.

vlimant commented 5 years ago

"Physics" (AnalysisOp+DataOps) : got it

vlimant commented 5 years ago

I would not rely on these IntelROCCS files anymore

what else can be used ?

dr-stringfellow commented 5 years ago

Full updates finally succeeded. There is no officially provided file like the IntelROCCS one. That one is probably the best bet if you need the information, but it cannot be guaranteed everything is 100% correct to the last byte.

vlimant commented 5 years ago

thx. there is yet no certainty (and more evidence against) that the existing dataops subscription are handed over to analysisops

dr-stringfellow commented 5 years ago

I am looking at the last.log. So far, it seems that many are special cases, like: https://cmsweb.cern.ch/phedex/prod/Data::Subscriptions#state=create_since%3D0%3Bfilter%3D%2FDisplacedSUSY_stopToBottom_M_900_100mm_TuneCP5_13TeV_pythia8%2FRunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1%2FAODSIM

vlimant commented 5 years ago

here an extra case https://cmsweb.cern.ch/phedex/datasvc/xml/prod/subscriptions?dataset=/SeesawTypeIII_M-300_2e_13TeV-madgraph/RunIIFall17NanoAOD-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/NANOAODSIM

vlimant commented 5 years ago

there are many for which the dataops subscription was turned to dataops "by hand", by me ; we'll pick this up on further round of https://cms-unified.web.cern.ch/cms-unified/logs/remainor/last.log

vlimant commented 5 years ago

https://cmsweb.cern.ch/phedex/datasvc/xml/prod/subscriptions?dataset=/RadionTohhTohWWhbb_width0p10_M-1900_TuneCP2_13TeV-madgraph_pythia8/RunIIFall17NanoAODv4-PU2017_12Apr2018_Nano14Dec2018_102X_mc2017_realistic_v6-v1/NANOAODSIM there isn't even an anaops subscription and the dataops is still in place

vlimant commented 5 years ago

maybe a more recent one

https://cmsweb.cern.ch/phedex/datasvc/xml/prod/subscriptions?dataset=/LQLQToTopTau_M-1400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16NanoAODv3-PUMoriond17_94X_mcRun2_asymptotic_v3-v2/NANOAODSIM

<dataset bytes="190681036" files="1" is_open="y" name="/LQLQToTopTau_M-1400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16NanoAODv3-PUMoriond17_94X_mcRun2_asymptotic_v3-v2/NANOAODSIM" id="1333849"><subscription priority="low" time_start="" move="n" suspend_until="" node="T2_US_UCSD" percent_bytes="100" time_create="1557090643.20659" node_files="1" time_update="1557090643.20659" group="DataOps" node_id="62" request="1752760" node_bytes="190681036" level="DATASET" custodial="n" suspended="n" percent_files="100"/><subscription priority="low" time_start="" move="n" suspend_until="" node="T0_CH_CERN_MSS" percent_bytes="100" time_create="1557107084.06314" node_files="1" time_update="1557107084.06314" group="DataOps" node_id="2" request="1752987" node_bytes="190681036" level="DATASET" custodial="y" suspended="n" percent_files="100"/><subscription priority="low" time_start="" move="n" suspend_until="" node="T2_FR_GRIF_IRFU" percent_bytes="100" time_create="1557121435.48554" node_files="1" time_update="1557121435.48554" group="AnalysisOps" node_id="82" request="1753152" node_bytes="190681036" level="DATASET" custodial="n" suspended="n" percent_files="100"/></dataset>

the initial UCSC subscription is still out there as dataops

https://cmsweb.cern.ch/reqmgr2/data/request?mask=NonCustodialSites&name=pdmvserv_task_B2G-RunIISummer16MiniAODv3-03785__v1_T_190416_105211_9630

vlimant commented 5 years ago

for reference to the old code https://raw.githubusercontent.com/vlimant/IntelROCCS/master/DataDealer/assignDatasetToSite.py that was used and was transferring dataops subscription in full to analysisops right away ; many of the example above seem to indicate that this is not done anymore. many of the FEVT mentioned in this GH are falling under this case ; like

https://cmsweb.cern.ch/phedex/datasvc/xml/prod/subscriptions?dataset=/GMSB_L100_Ctau10_Pythia8_TuneCP5_14TeV/PhaseIIMTDTDRAutumn18DR-PU200_103X_upgrade2023_realistic_v2-v1/FEVT

<subscription priority="low" time_start="" move="n" suspend_until="" node="T1_IT_CNAF_Disk" percent_bytes="100" time_create="1552215717.70677" node_files="398" time_update="1552215717.70677" group="DataOps" node_id="661" request="1669890" node_bytes="9280075106915" level="DATASET" custodial="n" suspended="n" percent_files="100"/>

dr-stringfellow commented 5 years ago

We changed the groups of the FEVT to AnalysisOps. There was a manual lock placed by MTD. All the NanoAOD have a hard protection:

https://github.com/SmartDataProjects/dynamo-policies/blob/master/detox/Physics.txt#L27

We are currently discussing how to help the ones in DataOps.

vlimant commented 5 years ago

the manual lock by MTD should not change the fact that the dataops subscription should have been changed to analysisops in the first place

vlimant commented 5 years ago

from

https://cms-unified.web.cern.ch/cms-unified/showlog/?search=/GMSB_L100_Ctau10_Pythia8_TuneCP5_14TeV/PhaseIIMTDTDRAutumn18DR-PU200_103X_upgrade2023_realistic_v2-v1/FEVT&module=closor

/GMSB_L100_Ctau10_Pythia8_TuneCP5_14TeV/PhaseIIMTDTDRAutumn18DR-PU200_103X_upgrade2023_realistic_v2-v1/FEVT is send to dynamo in 1 copies []

this can only return true if dynamo said "OK"

dr-stringfellow commented 5 years ago

/GMSB_L100_Ctau10_Pythia8_TuneCP5_14TeV/PhaseIIMTDTDRAutumn18DR-PU200_103X_upgrade2023_realistic_v2-v1/FEVT indeed does not show in our database. This particular one will be impossible to trace back.

vlimant commented 5 years ago

yup. and because unified will go on only and only if

return (res['result'] == "OK")

this means we are loosing requests

vlimant commented 5 years ago

how about /LQLQToTopTau_M-1400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16NanoAODv3-PUMoriond17_94X_mcRun2_asymptotic_v3-v2/NANOAODSIM ?

dr-stringfellow commented 5 years ago

yup. and because unified will go on only and only if

return (res['result'] == "OK")

this means we are loosing requests

At a tiny rate, yes. We are still looking into it with debug statements etc. Since we put them in, we havent noticed something odd. We are waiting for such a case.

The NANOAODSIM was replicated to another site and now both are protected because of the line in the policy.

vlimant commented 5 years ago

and yet, the dataops subscription at UCSD is not moved to anaops as it was supposed to (or at least pre-change to the copy api)

vlimant commented 5 years ago

just saw this https://cmsweb.cern.ch/phedex/datasvc/xml/prod/subscriptions?dataset=/SingleNeutrino/RunIIAutumn18DRPremix-forRECO_102X_upgrade2018_realistic_v15_ext1-v1/GEN-SIM-RECO

/SingleNeutrino/RunIIAutumn18DRPremix-forRECO_102X_upgrade2018_realistic_v15_ext1-v1/GEN-SIM-RECO is send to dynamo in 1 copies []

on Nov 11

https://cms-unified.web.cern.ch/cms-unified/showlog/?search=%2FSingleNeutrino%2FRunIIAutumn18DRPremix-forRECO_102X_upgrade2018_realistic_v15_ext1-v1%2FGEN-SIM-RECO&module=closor

there might be a blanket lock on GEN-SIM-RECO, but how come the original dataops subscription to Nebraska is not already anaops

vlimant commented 5 years ago

I get it that there isn't any issue code-wise, so let's go ahead and close this