CCI-MOC / xdmod-cntr

A project to prototype the use of XDMOD with OpenStack and OpenShift on the MOC
1 stars 6 forks source link

xdmod ticket #33271 aggregate_supremm.sh runs to complete, but no data appears in the supremm section of xdmod #227

Open rob-baron opened 10 months ago

rob-baron commented 10 months ago

Robert Bartlett Baron, reported about 1 month ago The script "aggregate_supremm.sh" ran to completion.

However, I still do not see any data appearing in the supremm section within the xdmod GUI.

Any suggestions as to what to check next? Robert Bartlett Baron , said 28 days ago We have modeled our OpenShift cluster as an HPC cluster, so the individual pods are showing up as jobs. A pod usually uses a couple hundred milli-cpu, or 0.200 CPU. When this gets shredded, it appears as 0 CPU in the jobs table. But the job's table cpu only has an integer value.

Unfortunately, none of these gets transferred to the supremm table when the aggregate_supremm.sh runs.

Should we report CPU as milli CPU - that is 200 - which could be stored in an integer value. If so, then how would it it reported by supremm, so that the units are in CPU (the values from the jobs table are divided by 1000.

Is there a better way to do this? Robert Bartlett Baron

rob-baron commented 10 months ago

Conner Saeli , said 7 days ago Ticket: https://help.xdmod.org/support/tickets/33271

Hi Robert,

Just to clarify, are you referring to the "cores" or "cores_avail" in modw_supremm.job?

The "cores" column for the job table in supremm is taken from the "processor_count" column in modw.job_tasks. "processor_count" is also stored as an integer. My follow-up questions for you are: Is the value for "processor_count" in modw.job_tasks for your instance of XDMoD stored in milicpus? Is this number an integer or a float? Are you able to properly view Job information? On the other hand, the "cores_avail" column is populated from performance data that is available for individual cores. For example, if a job requests 4 cores, but there are performance data for only 3 cores, then "cores" would be the "cores_avail" column would be 3. I do not know how PCP reports fractional CPUs in OpenShift, so I cannot provide any insight into how to use this information in supremm.

​Thanks, Conner Saeli

Robert Bartlett Baron , said less than a minute ago Conner,

Thank you for responding to me after 2 months. As a way of working around the lack of support for cloud computing, specifically lack of support for kubernetes, we created a PCP report to contain equivalent information.

When we went checked the results we were finding that the processor_count was reporting the floor of the value in the log file. As I hadn't heard back from you, I created a test set of log files and multiplied the ncpu filed by 1000 and made it an int. After shredding and ingesting, the processor_count field was being stored as an integer, though the technical unit would be in milli cores.

And yes, the job information is viewable.

So both the "processor_count" and the "cores_avail" need to be integers. I'm assuming that if they are set in milli cores that aggregate_supremm will create the ratio of processor_count/cores_avail which would be unitless.

As my last day at BU is 06-SEP-2023 (in 2 days) feel free to close this ticket.

Thank you,

Robert Baron