dmwm / CMSRucio

7 stars 31 forks source link

Migrate Consistency Enforcement configuration from local files to Rucio server #302

Closed ericvaandering closed 1 year ago

ivmfnal commented 1 year ago

Here is a proposal how to pus some CE configuration into Rucio configuration and keep the configuration file: https://docs.google.com/document/d/1ynvM-qZlRMPL3DZvnpAjfykBjrsTWuSm0DlnpAjTM2g/edit?usp=sharing

It is documented here: https://github.com/ivmfnal/cms_consistency/blob/master/site_cmp3/README.rst#parameters-controlled-by-site-admin

This is actually already implemented and in production.

ivmfnal commented 1 year ago

Having implemented the functionality described in the proposal, I added the "disabled" CE run state to the monitor.

I used T2_AT_Vienna as a test site once. I disabled the RSE using the boolean RSE attribute CE_config.ce_disabled, I started the run for the RSE manually and now it is shown as disabled in the monitor:

Summary page: https://cmsweb.cern.ch/rucioconmon/ce/index?view=-ce_run RSE page: https://cmsweb.cern.ch/rucioconmon/ce/show_rse?rse=T2_AT_Vienna

dynamic-entropy commented 1 year ago

Hello @ivmfnal Can you please review the script/command that you used to set/delete the rse-attributes for Vienna? Apparently, it deleted all rse-attributes that were set for the site. This caused the site to go down since the changes, https://cmssst.web.cern.ch/siteStatus/detail.html?site=T2_AT_Vienna . This was the RSE config at 1030 hrs.

❯ cat vienna_site_config_0ct9_2023_1030hrs
Settings:
=========
  availability: 7
  availability_delete: True
  availability_read: True
  availability_write: True
  credentials: None
  delete_protocol: 1
  deterministic: True
  domain: ['lan', 'wan']
  id: 0465a108757a4acb8f2f18085b8e25f7
  lfn2pfn_algorithm: cmstfc
  qos_class: None
  read_protocol: 1
  rse: T2_AT_Vienna
  rse_type: DISK
  sign_url: None
  staging_area: False
  third_party_copy_read_protocol: 1
  third_party_copy_write_protocol: 1
  verify_checksum: True
  volatile: False
  write_protocol: 1
Attributes:
===========
  CE_cfg.ce_disabled: True
  rule_approvers: useren,nhoerman,ebirngru
  site_admins: useren,nhoerman,ebirngru
Protocols:
==========
  davs
    domains: '{"lan": {"read": 0, "write": 0, "delete": 0}, "wan": {"read": 1, "write": 1, "delete": 1, "third_party_copy_read": 1, "third_party_copy_write": 1}}'
    extended_attributes: None
    hostname: eos.grid.vbc.ac.at
    impl: rucio.rse.protocols.gfal.Default
    port: 8443
    prefix: /eos/vbc/experiments/cms/
    scheme: davs
  root
    domains: '{"lan": {"read": 0, "write": 0, "delete": 0}, "wan": {"read": 3, "write": 3, "delete": 3, "third_party_copy_read": 3, "third_party_copy_write": 3}}'
    extended_attributes: None
    hostname: eos.grid.vbc.ac.at
    impl: rucio.rse.protocols.gfal.Default
    port: 1094
    prefix: //eos/vbc/experiments/cms/
    scheme: root
Usage:
======
  expired
    files: 30403
    free: None
    rse: T2_AT_Vienna
    rse_id: 0465a108757a4acb8f2f18085b8e25f7
    source: expired
    total: 178245012608213
    updated_at: 2023-10-09 08:20:28
    used: 178245012608213
  obsolete
    files: 4
    free: None
    rse: T2_AT_Vienna
    rse_id: 0465a108757a4acb8f2f18085b8e25f7
    source: obsolete
    total: 3069665831
    updated_at: 2023-10-09 08:20:19
    used: 3069665831
  rucio
    files: 196263
    free: None
    rse: T2_AT_Vienna
    rse_id: 0465a108757a4acb8f2f18085b8e25f7
    source: rucio
    total: 446405914085328
    updated_at: 2023-10-08 20:16:51
    used: 446405914085328
  static
    files: None
    free: None
    rse: T2_AT_Vienna
    rse_id: 0465a108757a4acb8f2f18085b8e25f7
    source: static
    total: 500000000000000
    updated_at: 2022-09-19 09:46:47
    used: 500000000000000
  unavailable
    files: 9
    free: None
    rse: T2_AT_Vienna
    rse_id: 0465a108757a4acb8f2f18085b8e25f7
    source: unavailable
    total: 4846303858
    updated_at: 2023-10-09 10:30:02
    used: 4846303858
RSE limits:
===========
  MinFreeSpace: 50000000000000 B
ivmfnal commented 1 year ago

Are you saying you used the suggested command and it removed all the attributes ? What command are you talking about ?

I see some attributes in the output you included:

Attributes:
===========
  CE_cfg.ce_disabled: True
  rule_approvers: useren,nhoerman,ebirngru
  site_admins: useren,nhoerman,ebirngru
dynamic-entropy commented 1 year ago

Hello, no, I did not. But your testing of the RSE attribute seems to coincide with the disappearance of other attributes and thus errors on the site capacity page. The rule_approvers and site_admins are added by the sync scripts. That is why we see them.

ivmfnal commented 1 year ago

@dynamic-entropy Will the sync scripts or any other automated procedures override CE_config.* attributes ?

ericvaandering commented 1 year ago

No, the sync scripts set some attributes like rule_approvers: and site_admins.

ivmfnal commented 1 year ago

@dynamic-entropy probably I accidentally wiped those attributes earlier some time this weekend while debugging. Current tool which I use does not do blanket removals.

dynamic-entropy commented 1 year ago

Alright, no worries. Just wanted to make sure, we do not have side effects.

ericvaandering commented 1 year ago

Seems this is a not a bug.