Closed vlimant closed 5 years ago
see https://github.com/CMSCompOps/WmAgentScripts/blob/7af051b1e72d04fab86af87b608a885bd3180d2b/utils.py#L635 for how this is called
can you give me an example for when it didn't work? I can check what we have in the databases.
I don't know when ; looks like it did not do it for
/DoublePhotonNoMaterial_FlatPt-0p01To10/RunIIWinter19PFCalibDR-2017ConditionsNoPUExtZeroMaterial_NoMaterial_105X_mc2017_realistic_v5-v1/AODSIM
Hard to say. There is no record of it in our databases, but the server was down for a couple hours recently. Probably we've just been unlucky with this one.
let me take a recent example with
/ChargedHiggsToCS_M155_13TeV-madgraph/RunIISummer16NanoAODv4-PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/NANOAODSIM
requested in full to purdue by the wmagent, under dataops (regular operation)
https://cmsweb.cern.ch/phedex/prod/Request::View?request=1744128
announced to dynamo :
https://cms-unified.web.cern.ch/cms-unified/showlog/?search=task_HIG-RunIISummer16NanoAODv4-02766
/ChargedHiggsToCS_M155_13TeV-madgraph/RunIISummer16NanoAODv4-PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/NANOAODSIM 4974/4974 = 100.00%
/ChargedHiggsToCS_M155_13TeV-madgraph/RunIISummer16NanoAODv4-PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/NANOAODSIM is send to dynamo in 1 copies []
workflow outputs are announced
but then there are no additional subscription anywhere
https://cmsweb.cern.ch/phedex/datasvc/xml/prod/requestlist?dataset=/ChargedHiggsToCS_M155_13TeV-madgraph/RunIISummer16NanoAODv4-PUMoriond17_Nano14Dec2018_102X_mcRun2_asymptotic_v6-v1/NANOAODSIM
and the subscription to Purdue is still dataops. Once the tape copy succeeds, it will go away from disk, as it is now unlocked by unified (or soon will be) and residing in dataops space
@dr-stringfellow you were telling me this behavior is expected, can you please elaborate here ? it looks to me that this is impacting rather negatively the dataops space and likely something has to be done
further on this, I noticed that http://t3serv001.mit.edu/~cmsprod/IntelROCCS/Detox/result/T1_DE_KIT_Disk/RemainingDatasets.txt is not up to date :
#- DDM Partition: DataOps -
#
# Rank Size nsites nsites DatasetName
#[~days] [GB] before after
#---------------------------------------------
3.3 172665.6 62 62 /MinBias_TuneCP5_14TeV-pythia8/PhaseIIMTDTDRAutumn18DR-NoPU_103X_upgrade2023_realistic_v2-v1/FEVT
while
<subscription priority="low" time_start="" move="n" suspend_until="" node="T1_DE_KIT_Disk" percent_bytes="100" time_create="1549079947.78276" node_files="17229" time_update="1556572515.75751" group="AnalysisOps" node_id="1821" request="1614576" node_bytes="178959788698034" level="DATASET" custodial="n" suspended="n" percent_files="100"/>
it is supposed to be anaops
Do you know when the group was changed?
some days ago I think
We run bi-weekly full updates fetching everything from phedex. We find out about group changes then. In the hourly delta updates we cannot register it, since the replica that had its group changed will not appear if you query phedex with "update_since". Only size changes etc will show up, no group changes. It's been pointed out a couple times already to phedex developers in the past, but we've never been insistent enough to have the group change also change the last_update timestamp.
when was the last full update ? the dataset actually appears as both anaops and dataops in http://t3serv001.mit.edu/~cmsprod/IntelROCCS/Detox/result/T1_DE_KIT_Disk/RemainingDatasets.txt which I find a little odd
Last update was a week ago. I would not rely on these IntelROCCS files anymore. Looking into our inventory, I see it only in DataOps. By the way, the txt file lists "Physics" (AnalysisOp+DataOps) and "DataOps". You cannot conclude that it's AnalysisOps when it's appearing in Physics.
"Physics" (AnalysisOp+DataOps) : got it
I would not rely on these IntelROCCS files anymore
what else can be used ?
Full updates finally succeeded. There is no officially provided file like the IntelROCCS one. That one is probably the best bet if you need the information, but it cannot be guaranteed everything is 100% correct to the last byte.
thx. there is yet no certainty (and more evidence against) that the existing dataops subscription are handed over to analysisops
I am looking at the last.log. So far, it seems that many are special cases, like: https://cmsweb.cern.ch/phedex/prod/Data::Subscriptions#state=create_since%3D0%3Bfilter%3D%2FDisplacedSUSY_stopToBottom_M_900_100mm_TuneCP5_13TeV_pythia8%2FRunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1%2FAODSIM
there are many for which the dataops subscription was turned to dataops "by hand", by me ; we'll pick this up on further round of https://cms-unified.web.cern.ch/cms-unified/logs/remainor/last.log
https://cmsweb.cern.ch/phedex/datasvc/xml/prod/subscriptions?dataset=/RadionTohhTohWWhbb_width0p10_M-1900_TuneCP2_13TeV-madgraph_pythia8/RunIIFall17NanoAODv4-PU2017_12Apr2018_Nano14Dec2018_102X_mc2017_realistic_v6-v1/NANOAODSIM there isn't even an anaops subscription and the dataops is still in place
maybe a more recent one
<dataset bytes="190681036" files="1" is_open="y" name="/LQLQToTopTau_M-1400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16NanoAODv3-PUMoriond17_94X_mcRun2_asymptotic_v3-v2/NANOAODSIM" id="1333849"><subscription priority="low" time_start="" move="n" suspend_until="" node="T2_US_UCSD" percent_bytes="100" time_create="1557090643.20659" node_files="1" time_update="1557090643.20659" group="DataOps" node_id="62" request="1752760" node_bytes="190681036" level="DATASET" custodial="n" suspended="n" percent_files="100"/><subscription priority="low" time_start="" move="n" suspend_until="" node="T0_CH_CERN_MSS" percent_bytes="100" time_create="1557107084.06314" node_files="1" time_update="1557107084.06314" group="DataOps" node_id="2" request="1752987" node_bytes="190681036" level="DATASET" custodial="y" suspended="n" percent_files="100"/><subscription priority="low" time_start="" move="n" suspend_until="" node="T2_FR_GRIF_IRFU" percent_bytes="100" time_create="1557121435.48554" node_files="1" time_update="1557121435.48554" group="AnalysisOps" node_id="82" request="1753152" node_bytes="190681036" level="DATASET" custodial="n" suspended="n" percent_files="100"/></dataset>
the initial UCSC subscription is still out there as dataops
for reference to the old code https://raw.githubusercontent.com/vlimant/IntelROCCS/master/DataDealer/assignDatasetToSite.py that was used and was transferring dataops subscription in full to analysisops right away ; many of the example above seem to indicate that this is not done anymore. many of the FEVT mentioned in this GH are falling under this case ; like
<subscription priority="low" time_start="" move="n" suspend_until="" node="T1_IT_CNAF_Disk" percent_bytes="100" time_create="1552215717.70677" node_files="398" time_update="1552215717.70677" group="DataOps" node_id="661" request="1669890" node_bytes="9280075106915" level="DATASET" custodial="n" suspended="n" percent_files="100"/>
We changed the groups of the FEVT to AnalysisOps. There was a manual lock placed by MTD. All the NanoAOD have a hard protection:
https://github.com/SmartDataProjects/dynamo-policies/blob/master/detox/Physics.txt#L27
We are currently discussing how to help the ones in DataOps.
the manual lock by MTD should not change the fact that the dataops subscription should have been changed to analysisops in the first place
from
/GMSB_L100_Ctau10_Pythia8_TuneCP5_14TeV/PhaseIIMTDTDRAutumn18DR-PU200_103X_upgrade2023_realistic_v2-v1/FEVT is send to dynamo in 1 copies []
this can only return true if dynamo said "OK"
/GMSB_L100_Ctau10_Pythia8_TuneCP5_14TeV/PhaseIIMTDTDRAutumn18DR-PU200_103X_upgrade2023_realistic_v2-v1/FEVT indeed does not show in our database. This particular one will be impossible to trace back.
yup. and because unified will go on only and only if
return (res['result'] == "OK")
this means we are loosing requests
how about /LQLQToTopTau_M-1400_TuneCUETP8M1_13TeV_pythia8/RunIISummer16NanoAODv3-PUMoriond17_94X_mcRun2_asymptotic_v3-v2/NANOAODSIM ?
yup. and because unified will go on only and only if
return (res['result'] == "OK")
this means we are loosing requests
At a tiny rate, yes. We are still looking into it with debug statements etc. Since we put them in, we havent noticed something odd. We are waiting for such a case.
The NANOAODSIM was replicated to another site and now both are protected because of the line in the policy.
and yet, the dataops subscription at UCSD is not moved to anaops as it was supposed to (or at least pre-change to the copy api)
/SingleNeutrino/RunIIAutumn18DRPremix-forRECO_102X_upgrade2018_realistic_v15_ext1-v1/GEN-SIM-RECO is send to dynamo in 1 copies []
on Nov 11
there might be a blanket lock on GEN-SIM-RECO, but how come the original dataops subscription to Nebraska is not already anaops
I get it that there isn't any issue code-wise, so let's go ahead and close this
it looks like passing dataset in that api will not have existing subscriptions in full under dataops be turned to anaops (as was the previous ddm script doing). @dr-stringfellow can you please confirm what is expected of that api ?