dmwm / CMSRucio

7 stars 31 forks source link

Enhancement: Fine tune distances between RSEs #837

Open d-ylee opened 3 weeks ago

d-ylee commented 3 weeks ago

Enhancement Description

Currently, two RSEs that do not share a region are given the slowest/longest distance value of 13, but that probably doesn't make sense.

Use Case

The fine tuning of distance could allow more optimal transfers.

Possible Solution

We may want to try a 4x4 matrix where the two European regions are declared better connected than Europe/US and where Europe and US are better connected than anything to Other.

The logic here could be modified: https://github.com/dmwm/CMSRucio/blob/fc5a7c46d3a6680442059392be016631c37a0dfc/docker/rucio_client/scripts/cmslinks.py#L29

Example Distances Comparison: Same Site < Europe+Europe or US+US < Europe+US < Other

Related Issues

No response

d-ylee commented 3 weeks ago

@KatyEllis @ericvaandering This issue is for what was discussed in our meeting today. Are there any more details that need to be added?

d-ylee commented 3 weeks ago

I made a script to pull the RSE list in our CMS production node. I learned that each RSE is assigned a region (A, B, C, D), but I also found that not all RSEs are assigned a region. Would these RSEs need to have the region attribute applied? Are these regions defined somewhere?

ericvaandering commented 3 weeks ago

These regions were defined a long time ago based on some actual measurements. I guess the logic would have to be that things that don't have a region are treated worst off all for the region part of the calculation. But should be treated according to country if they have one, etc.

Which sites don't have a region? Are any of them Tier2s? My suspicion is those sites got added and no one bothered to set the region since it's done by hand.

d-ylee commented 3 weeks ago

@ericvaandering These are the sites that did not have a region assigned as an RSE attribute.

{
  "None": [
    "T3_CH_CERN_CTA_RecallTest",
    "T3_US_MIT",
    "T2_PK_NCP_Temp",
    "T2_FR_GRIF_IRFU_Temp",
    "T2_HU_Budapest_Temp",
    "T2_US_Florida_Temp",
    "T2_US_MIT_Temp",
    "T2_US_Purdue_Temp",
    "T2_US_UCSD_Temp",
    "T2_US_Wisconsin_Temp",
    "T2_FR_IPHC_Temp",
    "T2_CH_CERN_Temp",
    "T1_FR_CCIN2P3_Tape",
    "T1_RU_JINR_Tape",
    "T1_DE_KIT_Tape",
    "T3_US_UMiss",
    "T3_US_NotreDame_Test",
    "T3_CH_CERN_CTA_CastorTest",
    "T3_IT_MIB_Temp",
    "T3_KR_KISTI_Test",
    "T3_KR_KISTI",
    "T3_US_CMU_Test",
    "T3_US_Rice",
    "T3_US_PuertoRico",
    "T2_PT_NCG_Lisbon_Temp",
    "T2_DE_DESY_Temp",
    "T2_RU_IHEP_Temp",
    "T2_IT_Pisa_Temp",
    "T2_TR_METU_Temp",
    "T2_PL_Swierk_Temp",
    "T1_US_FNAL_Tape_Test",
    "T1_FR_CCIN2P3_Tape_Test",
    "T3_US_Baylor",
    "T3_US_FNALLPC_Temp",
    "T3_BG_UNI_SOFIA",
    "T3_CH_PSI_Temp",
    "T3_US_NotreDame_Temp",
    "T2_CH_CSCS_Temp",
    "T2_AT_Vienna",
    "T3_CH_CERN_CTA_Test",
    "T3_HR_IRB",
    "T1_FR_CCIN2P3_Disk_Temp",
    "T3_US_Baylor_Test",
    "T2_UK_SGrid_Bristol_Temp",
    "T2_EE_Estonia_Temp",
    "T2_RU_ITEP_Temp",
    "T2_UK_London_IC_Temp",
    "T2_BR_SPRACE_Temp",
    "T2_AT_Vienna_Temp",
    "T2_KR_KISTI_Temp",
    "T1_IT_CNAF_Tape_Test",
    "T3_US_Princeton_ICSE",
    "T0_CH_CERN_Tape",
    "T1_RU_JINR_Disk_Temp",
    "T3_US_Rutgers_Test",
    "T1_IT_CNAF_Disk_Temp",
    "T3_IT_Trieste_Temp",
    "T3_US_UMD_Temp",
    "T0_CH_CERN_Disk",
    "T1_US_FNAL_Disk_Temp",
    "T3_US_CMU_Temp",
    "T3_TW_NTU_HEP_Test",
    "T1_UK_RAL_Disk_Temp",
    "T2_BE_IIHE_Temp",
    "T2_BE_UCL_Temp",
    "T2_US_Caltech_Temp",
    "T2_DE_RWTH_Temp",
    "T2_PL_Warsaw",
    "T2_RU_INR_Temp",
    "T2_GR_Ioannina_Temp",
    "T2_IN_TIFR_Temp",
    "T3_US_NERSC",
    "T3_US_PuertoRico_Test",
    "T3_CH_PSI_Test",
    "T3_MX_Cinvestav_Temp",
    "T3_FR_IPNL_Temp",
    "T1_ES_PIC_Disk_Temp",
    "T3_US_NotreDame",
    "T3_MX_Cinvestav_Test",
    "T1_DE_KIT_Disk_Temp",
    "T3_CH_CERN_OpenData",
    "T3_US_UMD",
    "T3_US_Rutgers",
    "T0_CH_CERN_Tape_Test",
    "T3_US_OSU",
    "T3_DM_MOCK_RSE",
    "T3_FR_IPNL",
    "T3_US_Colorado",
    "T3_US_CMU",
    "T3_CH_CERNBOX",
    "T0_CH_CERN_Disk_Test",
    "T3_US_FNALLPC_Test",
    "T2_IT_Bari_Temp",
    "T2_RU_JINR_Temp",
    "T2_US_Nebraska_Temp",
    "T2_ES_IFCA_Temp",
    "T2_TW_NCHC_Temp",
    "T1_US_FNAL_Tape",
    "T1_ES_PIC_Tape",
    "T3_US_FNALLPC",
    "T3_MX_Cinvestav",
    "T3_US_PuertoRico_Temp",
    "T3_TW_TIDC_Test",
    "T1_UK_RAL_Tape",
    "T1_UK_RAL_Tape_Test",
    "T3_IT_Trieste",
    "T2_FR_GRIF_Temp",
    "T3_TW_TIDC_Temp",
    "T2_FR_GRIF_LLR_Temp",
    "T2_BR_UERJ_Temp",
    "T3_KR_KNU",
    "T3_KR_KNU_Temp",
    "T2_IT_Legnaro_Temp",
    "T2_UK_London_Brunel_Temp",
    "T2_IT_Rome_Temp",
    "T2_UK_SGrid_RALPP_Temp",
    "T2_ES_CIEMAT_Temp",
    "T1_ES_PIC_Tape_Test",
    "T3_IT_MIB",
    "T3_IR_IPM",
    "T2_US_MIT_Tape",
    "T3_US_UMiss_Test",
    "T3_US_UMD_Test",
    "T3_TW_TIDC",
    "T3_US_Brown",
    "T3_TW_NTU_HEP",
    "T3_US_MIT_Test",
    "T3_BG_UNI_SOFIA_Test",
    "T3_CY_UCY_Temp",
    "T2_RC_MOCK",
    "T2_CN_Beijing_Temp",
    "T2_FI_HIP_Temp",
    "T2_US_Vanderbilt_Temp",
    "T2_UA_KIPT_Temp",
    "T1_IT_CNAF_Tape",
    "T1_RU_JINR_Tape_Test",
    "T1_DE_KIT_Tape_Test",
    "T3_CH_PSI",
    "T3_KR_UOS",
    "T3_IT_MIB_Test",
    "T2_US_MIT_Tape_Test",
    "T3_US_Colorado_Test",
    "T3_KR_UOS_Test",
    "T3_DM_MOCK_RSE2",
    "T3_IT_Bologna_Test"
  ]
}
ericvaandering commented 1 week ago

OK, from what I can see these mostly fall into Tier3 and un-used categories (Temp and Test). And Tape which surprises me but may make sense. Pulling off of tape should be last resort (but getting to tape should use the best link, so...) @nsmith- may have some insight here. I didn't go through exhaustively, but I only see Warsaw as a Tier2 with no region.

As I recall the regions were roughly Western Europe, Eastern Europe, North America, and Other. Does that match what you see.