dmwm / CMSRucio

7 stars 31 forks source link

Cannot deactivate a particular subscription #630

Open haozturk opened 11 months ago

haozturk commented 11 months ago

Describe the bug

I'm trying to disable a subscription and IIUC, the way to do it is to provide 0 lifetime (optional argument). However, it doesn't work

$ rucio-admin subscription update --lifetime 0 --account transfer_ops --priority 3 DQMIOToDisk '{"pattern": "/JetMET/.*/DQMIO", "did_type": "CONTAINER", "scope": ["cms"]}' '[{"rse_expression": "T2_CH_CERN", "copies": 1, "lifetime": 864000, "comment": "rule subscription for DQMIO according to ticket CMSTRANSF-636", "activity": "User Subscriptions"}]' 'DQMIOToDisk to store on CERN for 1 month for ticket CMSTRANSF-636'

Database exception.
Details: An unknown Database Exception has ocurred.
Rucio exited with an unexpected/unknown error, please provide the traceback below to the developers.
Traceback (most recent call last):
  File "/cvmfs/cms.cern.ch/rucio/x86_64/rhel7/py3/current/bin/rucio-admin", line 97, in new_funct
    return function(*args, **kwargs)
  File "/cvmfs/cms.cern.ch/rucio/x86_64/rhel7/py3/current/bin/rucio-admin", line 948, in update_subscription
    comments=args.comments, lifetime=args.lifetime, retroactive=False, dry_run=False, priority=args.priority)
  File "/cvmfs/cms.cern.ch/rucio/x86_64/rhel7/py3/current/lib/python3.6/site-packages/rucio/client/subscriptionclient.py", line 143, in update_subscription
    raise exc_cls(exc_msg)
rucio.common.exception.DatabaseException: Database exception.
Details: An unknown Database Exception has ocurred.

Python client also fails the same way:

from rucio.client import Client
client = Client()

r = client.update_subscription("DQMIOToDisk",account="transfer_ops", filter_={"pattern": "/JetMET/.*/DQMIO", "did_type": "CONTAINER", "scope": ["cms"]}, replication_rules=[{"rse_expression": "T2_CH_CERN", "copies": 1, "lifetime": False, "comment": "rule subscription for DQMIO according to ticket CMSTRANSF-636", "activity": "User Subscriptions"}], comments='DQMIOToDisk to store on CERN for 1 month for ticket CMSTRANSF-636', lifetime=False)

print (r)
Traceback (most recent call last):
  File "update_subscription.py", line 4, in <module>
    r = client.update_subscription("DQMIOToDisk",account="transfer_ops", filter_={"pattern": "/JetMET/.*/DQMIO", "did_type": "CONTAINER", "scope": ["cms"]}, replication_rules=[{"rse_expression": "T2_CH_CERN", "copies": 1, "lifetime": False, "comment": "rule subscription for DQMIO according to ticket CMSTRANSF-636", "activity": "User Subscriptions"}], comments='DQMIOToDisk to store on CERN for 1 month for ticket CMSTRANSF-636', lifetime=False)
  File "/cvmfs/cms.cern.ch/rucio/x86_64/rhel7/py3/current/lib/python3.6/site-packages/rucio/client/subscriptionclient.py", line 143, in update_subscription
    raise exc_cls(exc_msg)
rucio.common.exception.DatabaseException: Database exception.
Details: An unknown Database Exception has ocurred.

I tried with lifetime=0, too and I got the same error.

Andres discovered that it's possible to update subscriptions which are in ACTIVE status (not updated before), but it's not possible to do if it's in UPDATED status

To Reproduce

Explained above

Expected behavior

We should be able to disable subscriptions in a convenient way.

Additional context

I also see that it's not possible to delete subscriptions. I wonder why this is not supported.

Related Issues

https://its.cern.ch/jira/browse/CMSTRANSF-724

Panos512 commented 11 months ago

I suspect this might be related to some changes we have made in the Rucio core to fix Subscription history: https://github.com/rucio/rucio/issues/6109

With a quick look in the logs: k logs -n rucio daemons-transmogrifier-6cc449b5ff-v9gmb | grep 'subscription' -C 10

It looks like it's a unique constraint error:

rucio.common.exception.DatabaseException: Database exception.
Details: (cx_Oracle.IntegrityError) ORA-00001: unique constraint (CMS_RUCIO_PROD.SUBSCRIPTIONS_HISTORY_PK) violated
[SQL: INSERT INTO "CMS_RUCIO_PROD".subscriptions_history (id, name, filter, replication_rules, policyid, state, last_processed, account, lifetime, comments, retroactive, expired_at, created_at, updated_at) VALUES (:id, :name, :filter, :replication_rules, :policyid, :state, :last_processed, :account, :lifetime, :comments, :retroactive, :expired_at, :created_at, :updated_at)]

While typing this message I realised there is a related issue open in rucio core https://github.com/rucio/rucio/issues/6292

Let's follow up next week :)

haozturk commented 11 months ago

Thanks @Panos512 The errors that I see in the transmogrifier logs belong to the MINIOutsideUS subscription.

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/rucio/daemons/common.py", line 216, in _generator
    result = run_once_fnc(heartbeat_handler=heartbeat_handler, activity=activity)
  File "/usr/local/lib/python3.9/site-packages/rucio/daemons/transmogrifier/transmogrifier.py", line 699, in run_once
    update_subscription(
  File "/usr/local/lib/python3.9/site-packages/rucio/db/sqla/session.py", line 437, in new_funct
    raise DatabaseException(str(error))
rucio.common.exception.DatabaseException: Database exception.
Details: (cx_Oracle.IntegrityError) ORA-00001: unique constraint (CMS_RUCIO_PROD.SUBSCRIPTIONS_HISTORY_PK) violated
[SQL: INSERT INTO "CMS_RUCIO_PROD".subscriptions_history (id, name, filter, replication_rules, policyid, state, last_processed, account, lifetime, comments, retroactive, expired_at, created_at, updated_at) VALUES (:id, :name, :filter, :replication_rules, :policyid, :state, :last_processed, :account, :lifetime, :comments, :retroactive, :expired_at, :created_at, :updated_at)]
[parameters: {'id': b'(\xb57)\x05YL\x00\x9a\xa9-\xc0%i\xd1\x89', 'name': 'MINIOutsideUS', 'filter': '{"pattern": "^/.*/MINIAOD(|SIM)$", "did_type": "CONTAINER", "scope": ["cms"]}', 'replication_rules': '[{"rse_expression": "ddm_quota>0&(T1_DE_KIT_Disk|T1_ES_PIC_Disk|T1_FR_CCIN2P3_Disk|T1_IT_CNAF_Disk|T1_UK_Ral_Disk|T2_BE_IIHE|T2_BE_UCL|T2_CH_CSCS|T2_ ... (155 characters truncated) ... |T2_UK_London_Brunel|T2_UK_London_IC|T2_UK_SGrid_RALPP)", "grouping": "DATASET", "copies": 1, "weight": "ddm_quota", "activity": "Data rebalancing"}]', 'policyid': 3, 'state': 'U', 'last_processed': datetime.datetime(2023, 10, 16, 9, 25, 24, 353471), 'account': 'transfer_ops', 'lifetime': None, 'comments': 'Keeping one copy of MINI outside of US on most reliable sites', 'retroactive': 0, 'expired_at': None, 'created_at': datetime.datetime(2022, 8, 17, 15, 21, 55), 'updated_at': datetime.datetime(2023, 9, 27, 15, 4, 57)}]
(Background on this error at: https://sqlalche.me/e/20/gkpj)

I don't understand how this could be related

Panos512 commented 11 months ago

Yes, that's indeed not related. Sorry, please ignore what I said, it seems I was not very careful :)

dynamic-entropy commented 11 months ago

It seems to be an issue with the subscription_history table again. We are investigating what changes caused this. https://github.com/rucio/rucio/issues/6351

You can update the subscription now. I have turned it off. Thanks, @yuyiguo for the help.

haozturk commented 11 months ago

Thanks @dynamic-entropy , I was able to run it w/o any issue now

$ rucio-admin subscription update --lifetime 0 --account transfer_ops --priority 3 DQMIOToDisk '{"pattern": "/JetMET/.*/DQMIO", "did_type": "CONTAINER", "scope": ["cms"]}' '[{"rse_expression": "T2_CH_CERN", "copies": 1, "lifetime": 864000, "comment": "rule subscription for DQMIO according to ticket CMSTRANSF-636", "activity": "User Subscriptions"}]' 'DQMIOToDisk to store on CERN for 1 month for ticket CMSTRANSF-636'
$ 

My next question is how can I verify that this subscription is really disabled? It's still listed as UPDATED status:

transfer_ops: DQMIOToDisk UPDATED
  priority: 3
  filter: {"pattern": "/JetMET/.*/DQMIO", "did_type": "CONTAINER", "scope": ["cms"]}
  rules: [{"rse_expression": "T2_CH_CERN", "copies": 1, "lifetime": 864000, "comment": "rule subscription for DQMIO according to ticket CMSTRANSF-636", "activity": "User Subscriptions"}]
  comments: DQMIOToDisk to store on CERN for 1 month for ticket CMSTRANSF-636

while I see that subscriptions that @Panos512 disabled earlier are in INACTIVE status such as

transfer_ops: HcalNZS_Run2022x_-v1_RAW INACTIVE
  priority: 3
  filter: {"pattern": "^/HcalNZS/Run2022.*-v1/RAW", "did_type": "CONTAINER", "scope": ["cms"]}
  rules: [{"rse_expression": "T2_CH_CERN", "grouping": "CONTAINER", "copies": 1, "activity": "User Subscriptions", "lifetime": "7776000"}]
  comments: Deactivate by Panos CMSTRANSF-534

Is this just a matter of time?

dynamic-entropy commented 11 months ago

Yes

dynamic-entropy commented 4 months ago

Hello @haozturk Is there something to be done here?

haozturk commented 4 months ago

Yes, there is still a subscription that I couldn't deactivate. I took this action, but the subscription is still active.

$ rucio-admin subscription list | grep -A 6 Run3ScoutingRAWDisk
transfer_ops: Run3ScoutingRAWDisk UPDATED
  priority: 3
  filter: {"pattern": "^/ScoutingPFRun3/.*/RAW$", "did_type": "CONTAINER", "scope": ["cms"]}
  rules: [{"rse_expression": "ddm_quota>0&rse_type=DISK", "lifetime": 0, "grouping": "DATASET", "copies": 1, "weight": "ddm_quota", "activity": "Data rebalancing"}]
  comments: Keeping Scouting RAW Run3 on disk

I didn't have time to investigate further

dynamic-entropy commented 4 months ago

Ok, I assign this to you. Remember it's not part of the project board and could get missed.

haozturk commented 4 months ago

Ok, thanks. I added it to the project