dmwm / DMWMMON

1 stars 8 forks source link

Keep node names synchronized in SiteDB, TMDB, and DMWMMON. #22

Open nataliaratnikova opened 9 years ago

nataliaratnikova commented 9 years ago

Both PhEDEx and Space-Mon are using SiteDb for authorization. Once in a while the lists of nodes get out of sync and need to be corrected. We need monitoring to watch for inconsistencies which would email to the proper list when the fix is required. After some testing we may want to automate the synchronization part as well.

The APIs to get the lists of the nodes: SiteDB: https://cmsweb.cern.ch/sitedb/data/prod/site-names PhEDEx: https://cmsweb.cern.ch/phedex/datasvc/perl/prod/nodes DMWMMON: https://cmsweb.cern.ch/dmwmmon/datasvc/perl/nodes

nataliaratnikova commented 9 years ago

Added script for monitoring the differences: https://github.com/dmwm/DMWMMON/blob/master/MonitoringScripts/cms_nodes_sync.sh

It relies on a valid grid proxy to access sitedb and datasvc to get the node names lists. The recipe for auto-updating the proxy is scripted as well. Currently this all is running on development server at FNAL. As discussed at phedex dev meeting today, at first we want to handle the differences manually, and then see what can be automated.
Report summarizing the inconsistencies has been sent to the CMS CompOps site support team.

nataliaratnikova commented 9 years ago

After some cleanup by NIcolo on the siteDB side, here is the summary of inconsistencies:

NODE NAME SITEDB TMDB DMWMMON
T0_CH_CERN_Disk + + -
T1_CH_CERN_Buffer - - +
T1_CH_CERN_MSS - - +
T1_NT_HHJ - - +
T1_RU_JINR_Buffer + - -
T1_RU_JINR_MSS + - -
T1_TW_ASGC_Buffer - - +
T1_TW_ASGC_MSS - - +
T2_PL_Cracow - - +
T2_PT_LIP_Lisbon - - +
T2_Test_Addnode - - +
T2_Test_Buffer - - +
T2_Test_MSS - - +
T2_TW_Taiwan - - +
T3_BG_UNI_SOFIA - + -
T3_BY_NCPHEP + + -
T3_DE_Karlsruhe - + +
T3_HR_IRB + + -
T3_HU_Debrecen + + -
T3_IN_PUHEP + - -
T3_IN_VBU + + -
T3_KR_KISTI + + -
T3_TH_CHULA + + -
T3_TW_NCHC + + -
T3_UCLASaxon_Buffer - + +
T3_UMiss_Buffer - + +
T3_US_Baylor + + -
T3_US_MIT + + -
T3_US_NERSC + - -
T3_US_NU + + -
T3_US_Princeton_ARM + - -
T3_US_SDSC + + -
T3_US_UCLA - - +
T3_US_UCSB + + -
T3_US_Vanderbilt_EC2 + + -
T3_UVA_Buffer - + +

The cron will send such report every three days ( for now to myself only)