dmwm / WMCore

Core workflow management components for CMS.
Apache License 2.0
46 stars 107 forks source link

Timestamps for monitoring activities in WM job #11604

Closed klannon closed 12 months ago

klannon commented 1 year ago

Impact of the new feature WMAgent (WMArchive and job monitoring)

Is your feature request related to a problem? Please describe. Having the ability to track where time is spent in each phase of the production jobs.

Describe the solution you'd like In addition of the I/O metrics (let me know if these would warrant their own issue), in order to better understand the various CPU efficiency numbers, it would be useful to record and show in the monitoring

  1. begin and end timestamps (or begin timestamp and duration) of the WM program (sorry for the wrong term) that is run in the HTCondor job, and that calls cmsRun
  2. begin and end timestamps (or begin timestamp and duration) of each cmsRun invocation
  3. (bonus) total CPU time spent by the WM program (again sorry for the wrong term)

I'm not sure if any of these quantities are being recorded already, but if not, would it be feasible to add? (we'd also be interested of similar timekeeping by the pilot that runs the HTCondor jobs, but that's likely for other folks to talk to?)

UPDATE: whoever starts working on this, Kevin highly recommends talking to Ben Tovar first because ND WorkQueue already has a resource management mechanism which we could profit from.

UPDATE-2: this ticket is supposed to address only the item number 1. above, thus the overall job wallclocktime. Items 2 and 3 will be addressed in the following sub-task:

Describe alternatives you've considered None

Additional context Originally posted by @makortel in https://github.com/dmwm/WMCore/issues/11538#issuecomment-1563443351_

klannon commented 1 year ago

@makortel I broke this out in to a new issue because I think it's separate from the original. Sorry for not taking this up earlier.

As we discussed in person, it sounds like inserting these timestamps into the FJR would already be valuable, even if we weren't exposing this information through monitoring, so I propose we start with that. In principle, this is just a matter of capturing 4 (or 5 with bonus) numbers from the WM runtime script and inserting them into the FJR json file. This would make the issue low effort and potentially high gain.

What do you think?

amaltaro commented 1 year ago

Thanks for creating this issue, Kevin. I have updated the initial description to apply the "New feature" ticket template.

@makortel I have the following questions:

makortel commented 1 year ago

As we discussed in person, it sounds like inserting these timestamps into the FJR would already be valuable, even if we weren't exposing this information through monitoring, so I propose we start with that.

Assuming this means the numbers end up visible in WMArchive(?) so that they are accessible later on, sounds good.

  • on the first bullet, do you want to track the time spent by the job wrapper between: starting to prepare the cmsRun process (including but not limited to: load job description, scram CMSSW area, run PSet tweaks, etc) until the end of cmsRun, including post processing of the exit code from the bash script and FJR?

I see now that my formulation of the first and bonus bullets is a bit confusing. The first bullet means what you wrote for the bonus bullet, i.e. start time being the first line in the production job wrapper, and the end time the line before exiting the job wrapper (or as late as possible in order to still report it back, however that communication happens). I'll clarify the bonus bullet more below.

  • for the second bullet, isn't the cpu.TotalJobTime FJR metric supposed to report the cmsRun invocation time (aka., for how long the cmsRun process executed)

It does, with the subtle distinction that the time is measured by cmsRun itself from some moment around the beginning of the process to some moment around the end of the process. For completeness, we'd like to see how the cmsRun job duration looks from the WM job wrapper perspective, e.g. in case the cmsRun process startup (from the OS beginning to start a new process until the beginning of main()) or shutdown (from endJob transition to the actual termination of the process by the OS).

  • for the bonus bullet, I understand start time has to be the first line in our production job wrapper; while end time is meant to be the line before exiting the job wrapper. In other words, the overall production job time.

The bonus bullet is about recording the CPU time corresponding the duration the first bullet records the wall clock time. Effectively one could calculate the CPU efficiency of the overall production job from the numbers of the first bullet and the bonus bullet.

Would it be useful to have a zoom call at some point? I'm pretty sure I'm using different terminology (because of being ignorant of the proper one) and I'm concerned if that causes confusion.

amaltaro commented 1 year ago

Matti and I had a chat today to clarify the requirements and the summary is: 1) we need wallclock time for the whole job wrapper code (bonus if we could get the CPU time) 2) we need the wallclock time for each of the cmsRun steps (the closest to cmsRun the better). For this case, CPU time is also a must to be provided.

We also need to research the following two items:

@makortel Matti (thanks!) found this stack overflow link: https://stackoverflow.com/questions/26475636/measure-elapsed-time-amount-of-memory-and-cpu-used-by-the-extern-program explaining how to measure CPU time for child processes. It could be that getrusage(RUSAGE_CHILDREN) is the answer that we are looking for.

vkuznet commented 1 year ago

In order to understand how to implement this ticket the following items should be clarified first:

  1. What is persistent object where this information should be written, in other words do have a document, e.g. FWJR, which is carrying around with every step of the workflow execution or not. If yes, what is this document, if no then we need
    • define persistent storage for each job, should it be a (local or central) service, should it be database or any other format, etc.
  2. Identify which system can provide such metrics, e.g. does it job of WorkQueue to record start/end of the job or anything else
  3. How to properly define CPU metrics on different architectures, e.g. on Linux we can rely on /proc/cpuinfo, but it does not exist on Darwin (I do not have experience with ARM, Power ones)
  4. Should we record start/end time within WMCore modules or from external components, if former which parts of WMCore can be used for that, if later how this should be written and where it should go (e.g. bash script which start job will take time snapshot and update some FJR file)

If we can't establish how to persistently write this info with some storage (file or local/central service) for every job, would you be open for messaging service like NATS, AMQ brokers, etc. which will be used to push this info. If so, where these services can be located and how to deal with firewalls, e.g. we have example of Rucio tracer which relies on CERN AMQ and kafka.

I think we need a proper (in person) discussion to define details of implementation and then outline concrete steps in order to deal with this ticket.

amaltaro commented 1 year ago

After a discussion with Valentin this week, we thought that breaking this issue into 2 tickets would make the issue and effort more manageable. So I just created this second issue: https://github.com/dmwm/WMCore/issues/11660 as a sub-task of this one. The initial description also received a couple of UPDATE messages.

amaltaro commented 1 year ago

As briefly discussed over mattermost, I think we need to reopen this issue as we still have to figure out where exactly in the FJR (or which of them) these changes need to be made such that it becomes visible by ArchiveDataReporter, thus being converted to a WMArchive document and ending up in MonIT.

FYI @vkuznet

amaltaro commented 12 months ago

Valentin and I had a discussion about these timing metric tickets and we decided to create a new ticket to address the WMTiming metrics, such that it gets added to the right document/report file. The new ticket has been created: https://github.com/dmwm/WMCore/issues/11718

and we can now close this one.