dmwm / CMSRucio

7 stars 31 forks source link

Provide dashboard to get current state of Rucio service/API usage #381

Open vkuznet opened 1 year ago

vkuznet commented 1 year ago

Per our discussion in https://github.com/dmwm/WMCore/issues/11356#issuecomment-1299535083 I would like to request to implement timber like dashboard for Rucio service. In particular, we need the following type of information:

In order to make such dashboard, Rucio should provide relevant metrics from their log/APIs to the MONIT. @mrceyhun provided relevant https://github.com/dmwm/WMCore/issues/11356#issuecomment-1298450176 about data flow.

amaltaro commented 1 year ago

Thank you for creating this issue, Valentin. This will indeed be useful for debugging and also for future design considerations within WMCore.

haozturk commented 1 year ago

This will be very helpful for the OPS as well. The recent incident led to significant delay in production workflows and we couldn't figure out the root cause due to lack of monitoring.

haozturk commented 1 year ago

Hi all, I just want to mention that more and more workflows are getting affected by "the delay in Rucio injection" issue and we started to get complaints from the requestors side due to such delays.