Open haozturk opened 9 months ago
I'd start with Matti Kortelainen for CMSSW.
Thanks Eric, I'll contact him. For reference, here are the links to the existing data:
For WMArchive; I see that it already has the error information. See this link for production and this link for CRAB and look at data.steps
field of a random entry. The only problem is that it's not indexed, so it's not possible do queries using it. Now my plan is to make changes in rucio-tracers repo such that we parse this info and push it to /topic/cms.rucio.tracer
in the right field.
For xrootd and cmssw; we still don't know how to do it. Bockjoo doesn't know for AAA and I didn't get a reply from Matti, yet. I'll keep investigating
To my understanding the "CMSSW popularity" information originates from CMSSW's StatisticsSenderService
that sends UDP packets to "somewhere". The Service sends the UDP packet with bunch of information whenever the primary / secondary(=two-file solution) / embedded(=pileup) file is closed. While extending the data sent in via UDP would be straightforward (it's JSON after all), adding information on file read errors specifically does not look straightforward. If you really want to, we can take a deeper dive on what the implementation would entail, in which case please open a feature request issue in CMSSW GitHub.
Before committing to any development I'd like to understand why the information in WMArchive (that is filled from the CMSSW framework job reports from both production and CRAB(?)) would not be sufficient. Do you e.g. want to catch the read errors from all the users' non-CRAB jobs as well?
Thanks @makortel this is useful. I agree that we should start with WMArchive.
@yuyiguo I think you're one of the developers of rucio-tracers. In the first glance, it seems this task can be accomplished by feeding the errors
of data.steps
field in WMArchive into stateReason
field of rucio traces. I'm looking into how this can be accomplished. If you have comments on the subject before I start the implementation, it's very much appreciated. My only worry is that errors
field can be quite large in size. I don't know whether this would cause any issue.
Hi @ericvaandering @yuyiguo How can I test my changes in rucio-tracers? Is there a test queue that I can use to consume my implementation?
Edit: Adding in @dynamic-entropy as well in case he knows
I never looked at this, so cannot give an exact answer. But you can subscribe to the same queue with a different client and you will receive the same events without affecting prod.
Thanks Rahul. We had a chat with Rahul and Nikodemas offline and we'll request a new subscriber for this queue to be used for testing. If anybody has already a test subscriber for this queue, please let me know, so that we can avoid double work
I should have read this issue earlier...
Needed for https://github.com/dmwm/CMSRucio/issues/403 . Traces come from
I reckon we need to talk to the producers of these topics. I reckon, it's WMCore team for WMArchive, Bockjoo for xrootd. How about CMSSWPOP? Does CRAB push any data to AMQ? @ericvaandering any clues?
Context: https://indico.cern.ch/event/1356295/