EESSI / filesystem-layer

Filesystem layer of the EESSI project
https://eessi.github.io/docs/filesystem_layer
GNU General Public License v2.0
6 stars 16 forks source link

Monitor CVMFS infrastructure #67

Open bedroge opened 3 years ago

bedroge commented 3 years ago

We had an issue with one of our Stratum 1 servers this week, which caused it to serve an older tag of the repository. This made me realize again that we should think about setting up some monitoring dashboard that gives an overview and statistics of our infrastructure, sends out alerts when something is wrong, etc. One way to easily grab some information about Stratum 1s is by reading out the .cvmfspublished file (and maybe the .cvmfs_last_snapshot too); the structure of that file is explained here in the docs.

bedroge commented 3 years ago

Maybe we can register our repo here: https://cvmfs-monitor-frontend.web.cern.ch/

terjekv commented 3 years ago

https://cvmfs-monitor-frontend.web.cern.ch/alice-ocdb.cern.ch shows that entering proper metadata into your configuration is important! But yes, no need to reinvent the wheel, that looks good!

bedroge commented 3 years ago

Don't know if it will/can notify someone when there are issues, but nevertheless I think we should try to register our (pilot? production?) repo there anyway.

I'll take a look at adding the metadata. Can easily do that manually, but I'll see if it can be integrated into the Ansible role/playbook.

terjekv commented 3 years ago

I asked on the CERN CVMFS Mattermost what it would take to be included.

bedroge commented 3 years ago

This can also be used (for the Stratum 0): https://cvmfs.readthedocs.io/en/stable/cpt-repo.html#publisher-statistics

By setting CVMFS_UPLOAD_STATS_DB=true, the statistics database together with a web page with relevant plots will be published to the stratum 0 /stats location. This provides a lightweight monitoring for repository maintainers.

http://cvmfs-stratum-zero.cern.ch/cvmfs/sft-nightlies.cern.ch/stats-ws21/index.html

rptaylor commented 3 years ago

Using https://github.com/cvmfs-contrib/cvmfs-servermon is also a good idea. You can also use the CERN monitoring system mentioned in that README. You'll get emails which include all stratum servers (including e.g. WLCG ones) but you can use an email filter to exclude ones you're not interested in.