NAICNO / Jobanalyzer

Easy to use resource usage report
MIT License
0 stars 1 forks source link

Deploy sonar prototype to all ML nodes #19

Closed lars-t-hansen closed 8 months ago

lars-t-hansen commented 1 year ago

There are really a couple of different deployments to discuss, so this is a major metabug with lots of other bugs to be filed.

First, we need sonar to run and log samples. For this we need:

Second, we need to run sonalyze against the logs manually to test that. For this we need:

Third, we want to run the analysis automatically and flag possible problems. For this we need:

Final deployment task list, rough chronological plan:

lars-t-hansen commented 1 year ago

Re "create logic for sonar to distribute the log", there is a shell script to invoke sonar here that we can modify for our use.

lars-t-hansen commented 1 year ago

Re "determine a place" and "special uid", for the prototype I'll just run sonar as myself and store data in a tree under my home directory. We can revisit when the bugs are ironed out.

lars-t-hansen commented 1 year ago

Re cron, I'm going to set up a cron job manually on all the systems for now.

lars-t-hansen commented 1 year ago

Sonar running on ml[1-4,6-8] under my user, let's give it until over the weekend...

lars-t-hansen commented 1 year ago

Everything's been running fine for ~6 weeks under my user, both monitoring and analysis jobs, with data uploads to the web server and email being pushed to admin (me). It's probably time to move this setup from my user to a separate user. I think I will keep my own setup as a staging area.

Task list above has been updated.

Sabryr commented 1 year ago

Hello @lars-t-hansen , Thank you very much for your professional approach.

  1. We will ask for a no-login use for ML-nodes and for Fox. We need contact Bart to get the user on ML nodes and contact buZh on Fox.
  2. When it comes to where to keep the data. Is it possible we push this to GitHub.uio.no or Gitlab.sigma2.no ? . May be keep past week data in a raw format and then update a monthly summary file. What do you think ?. if a shared storage is better then we can use EES shared mounts.

/itf-fi-ml/shared

Regards, Sabry

lars-t-hansen commented 1 year ago

@sabryr, I think it's a good idea to archive the data so that we can run long-range analyses, by and by. It's text and it compresses extremely well, so keeping compressed monthly (say) archives is not going to be a problem for anyone. (On the ML nodes I think we generate about 2MB of text every day, it compresses by > 90% IIRC.) It's actually easier to keep the raw data and regenerate all reports than it is to keep the reports or summaries, and not much more expensive - and it'll be much more flexible, as we evolve this system.

That said:

(For the latter point, the analysis code currently needs uncompressed data but there's no reason why I couldn't fix that, it's been on my radar for a time.)

The discussion definitely ties into where we will run the analyses, and into the shape this monitor will have on fox and (maybe) light-hpc systems.