OSC / osc-reporting

Reporting on OOD jobs ran at OSC
MIT License
2 stars 0 forks source link

jobs running vs jobs queued (with gpu) #5

Open johrstrom opened 2 years ago

johrstrom commented 2 years ago

Copy the last 2 progress bar's functionality here. I'm not sure that progress bars are the right way to show this, maybe just a simple table?

        Running | Queued  |
CPU |  1000     |    500  |
GPU |   123     |   7     |

image

┆Issue is synchronized with this Asana task by Unito

johrstrom commented 2 years ago

closed in #16.

johrstrom commented 2 years ago

Reopening for the next round of improvements.

Let's see what a histogram type visual looks like - i.e, vertical progress bars. So they stand next to each other you can visualize the queue size a little better.

lukew3 commented 2 years ago

Let's see what a histogram type visual looks like - i.e, vertical progress bars. So they stand next to each other you can visualize the queue size a little better.

I'm having trouble understanding what you are envisioning. I get that there is no maximum value for the number of jobs available, so a progress bar doesn't make sense. However, I don't understand how a histogram would work in this situation. A histogram needs a series of bars with varying heights, which doesn't seem appropriate here. You also mentioned having vertical progress bars, but that would leave lots of unused space to the sides of the bars.

I didn't like the original table concept design because it made it seem like there was a difference between cpu and gpu jobs, so I changed it in #18. However, I'm thinking that maybe a table isn't appropriate here either since all it's showing is just a property and a value. It might be more appropriate to just use "property: value" in a text element.

johrstrom commented 2 years ago

However, I don't understand how a histogram would work in this situation.

We can use the splunk language which would be column chart or even stacked column chart for multiple things in 1 column. So we're doing/using 1 row bar charts right now. What I guess I'm suggesting is to actually group things together.

As an example - showing GPU utilization as a column chart (i.e., vertical bars) where the 2 columns are running and queued. That gets rid of that other partial but honestly I'm re-thinking what's the best way to display the information.

For jobs in general, queue size vs. running size isn't really as solid of a relationship for a researcher than GPUs. But being a column chart (maybe stacked with gpu info for extra data?) will at least give uniformity with the GPU column chart.

https://docs.splunk.com/Documentation/Splunk/8.2.5/Viz/ColumnBarCharts

lukew3 commented 2 years ago

Since we want to give insights on how many jobs are being blocked due to lack of GPU or other resources, it is probably important that we differentiate between jobs that are queued because they are blocked and jobs that are queued because they are scheduled.

sync-by-unito[bot] commented 2 years ago

➤ Luke Weiler commented:

Blocked until ood_core cluster_info is added. Improvements can still be made locally, but don't merge due to excessive logic in views.