Open vkuznet opened 1 year ago
Thank you for creating this issue, Valentin. This will indeed be useful for debugging and also for future design considerations within WMCore.
This will be very helpful for the OPS as well. The recent incident led to significant delay in production workflows and we couldn't figure out the root cause due to lack of monitoring.
Hi all, I just want to mention that more and more workflows are getting affected by "the delay in Rucio injection" issue and we started to get complaints from the requestors side due to such delays.
Per our discussion in https://github.com/dmwm/WMCore/issues/11356#issuecomment-1299535083 I would like to request to implement timber like dashboard for Rucio service. In particular, we need the following type of information:
In order to make such dashboard, Rucio should provide relevant metrics from their log/APIs to the MONIT. @mrceyhun provided relevant https://github.com/dmwm/WMCore/issues/11356#issuecomment-1298450176 about data flow.